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THE USE OF CONSENSUS SEQUENCES FOR TARGETED HOMOLOGOUS 
GENE ISOLATION AND RECOMBINATION IN GENE FAMILIES 

This is a continuing application of United States Application No. 60A)70734, filed December 11,1 997. 

HELD OF THE INVENTION 

The invention relates to compositions and methods for targeting sequence modifications in one or more 
genes of a related femily of genes using enhanced homologous recomt>ination techniques. The 
invention also relates to compositions and methods for isolating and identifying novel members of 
homologous sequences families. These techniques may t>e used to create animal or plant models of 
disease as well as to identify new targets for dn^ or pathogen screening. 

BACKGROUND 

Homologous recomtnnation (or general recomtxnation) is defined as the exchange of homologous 
segments anywhere along a length of two DNA molecules. An essential feature of general 
recombination is that the enzymes responsible for the recombination event can presumably use any 
pair of homologous sequences as sutistrates, although some types of sequence may be favored over 
others. Both genetic and cytologtcal studies have indicated that such a crossing-over process occurs 
between pairs of homologous chromosomes during meiosis in higher organisms. 

Altematively, in site-spedfic recombination, exchange occurs at a specific site, as in the integration of 
phage A Into the E co// chromosome and the excision of A DNA from it Site-spedfic recombination 
involves specific inverted repeat sequences; e.g. the Cre-loxP and FLP-FRT systems. Within these 
sequences there is only a short stretch of homology necessary for the recomt)ination event, but not 
sufRdent for it The enzymes involved in this event generally cannot recomtxne other pairs of 
homolog us(ornonhom logous)sequ nces, but act spedfically. 
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Alth ugh both site-specific recombination and h mol gous r combination are useful mechanisms for 
genetic ngine ring of DMA sequences, targeted homologous recombination provides a t>asts for 
targeting and altering essentially any desir d sequ nee in a duplex DNA molecule, such as targ ting a 
DMA sequence in a chr mosome for replacem nt by another sequ nc . Site^pecific recombination 
5 has been proposed as one method to integrate transfected DNA at chromosomal locations having 

specific recognition sites (O'Gorman etal. (1991) Science 251 : 1351; Onouchi et al. (1991) Nucleic 
Acfcis Res. 19: 6373). Unfortunately, since this approach requires the presence of specific target 
sequences and recombinases, its utility for targeting recombination events at any particular 
chromosomal location is severely limited in comparison to targeted general recombination. 

10 Homologous recombination has also been used to create transgenic plants and animals. Transgenic 

organisms contain stably integrated copies of genes or gene constructs derived fi'om another species in 
the chromosome of the transgenic organism. In addition, gene targeted animals can be generated by 
introducing cloned DNA constructs of the foreign genes into totipotent cells by a variety of methods, 
including homologous recombination. For example, animals that develop from genetically altered 

1 5 totipotent cells can contain the foreign gene in all somatic cells and also in germ-line cells. Currently 

methods for producing transgenic and targeted animals have t>een performed on totipotent embryonic 
stem cells (ES) and with fertilized z^otes. ES cells have an advantage in that large numbers of cells 
can be manipulated easily by homologous recombination in vitro before they are used to generate 
targeted animals. Currently, however, only embryonic stem cells from mice have been shown to 

20 contrftwjte to the germ 6ne. Altematively, DNA can also be introduced into fertilized oocytes by 

micro-Injection Into pronud^ which are then transferred into the uterus of a pseudo-pregnant recipient 
animal to develop to term. 

The ability of mammalian and human cells to incorporate exogenous genetic material into genes 
residing on chromosomes has demonstrated that these cells have the general enzymatic machinery for 
25 carrying out homologous recomtHnation required between resident and introduced sequences. These 

targeted recombination events can be used to correct mutations at known sites, replace genes or gene 
segments ^Mth defective ones, or introduce foreign genes into cells. 

Traditionally, exogenous sequences transferred into eulcaryotic cells undergo homologous 
recomtxnation with homologous erKJogenous sequences only at very low frequencies, and are so 

30 ineffidentty recombined that large numt)ers of celte must be transfected, selected, and screened in 

order to generate a desired correctly targeted homologous recombinant (Kucheriapati et al. (1984) 
Proc. Natl. Acad. Sd. (U SA) 81: 3153; Smithies. 0. (1985) Nature 317 : 230; Song etal. (1987) Proc. 
Natl. Acad. Sd. (U S A.) 84: 6820; Doetschman etal. (1987) Nature 330 : 576; Kim and Smithies (1988) 
Nucleic Adds Res- 16: 8887; Doetschman et al. (1988) op.dt. : Koller and Smithies (1989) oo.dt.: 

35 Shesely et al. (1991) Proc. Natl. Acad. Sd. (U S A) 88: 4294; Kim et al. (1991) Gene 103 : 227, which 

are incorporated herein by reference). 
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Several proteins or purified extracts having th property f promoting homologous r comtMnation Ci-e-t 
recomtnnase activity) have b n id ntified in prokaryotes and eukaryotes (Cox and Lehman (1 987) Ann. 
Rev. Biochem. §6: 229; Padding, C J^. (1 982) op.dt: Madiraju et aL (1 988) Proc. Natl. Acad. Set. 
gj.S 85: 6592; McCarthy et al. (1988) Proc. Natl. Acad. Sd. aJ.S 85: 5854; Lopez et al. (1987) 
op.dt. which are Incorporated herein by reference). These general recombinases presumably promote 
one or more steps in the formation of homologously-paired intermediates, strand-exchange, gene 
conversion, and/or other steps In the process of homologous recomtxnation. 

The frequency of homologous recombination in prokaryotes is s^nlficantty enhanced by the presence 
of recombinase activities. Several purified proteins catalyze homologous pairing and/or strand 
exchange in vitro, induding: E. coft recA protein, the T4 uvsX protein, the red protein from UsiHago 
maycSs, and Rad51 protein from S. cervisiae (Sung et al., Sdence 265:1241 (1994)) and human cells 
(Baumann et al., Cell 87:757 (1996)). Additional members of this protein family have been kJentified by 
homok)gy and function induding Rad51 A, B, C, D & E. Dosanjh, et d., (1 998) Nucleic Add Res. 
26:11 79-1 184 and dmcl . Recombinases and dmel, like the recA protein of E cofi are proteins which 
promote strand pairing and exchange. The most studied recombinase to date has been the recA 
recombinase of E. co//, which is involved in homology search and strand exchange reactions (see. Cox 
and Lehman (1987) op.dt) . RecA is required for induction of the SOS repair response, DNA repair, 
arui effident genetic recombination in E. cofi, RecA can catalyze homologous pairir)g of a linear duplex 
DNA and a honrK)l(^ous single strand DNA ni vitro . In contrast to site-spedfic recombinases, proteins 
Dke recA which are irtvoh^ in general recomtxnation recognize and promote pairing of DNA structures 
on the basis of shared homology, as has been shown by several in vitro experiments (Hsieh and 
Camerinl-Otero (1 989) J. Biol. Chem. 264 : 5089; Howard-Randers et al. (1 984) Nature 309 : 21 5; 
Stasiak et al. (1 984) Cold Soring Harbor Svmo. Quant Biol. 42: 561 ; Register et al. (1987) J. Biol. 
Chem. 262: 12812). Several invest^ators have used recA protein in vitro to promote homologously 
paired triplex DNA (Cheng et al. (1988) J. Biol. Chem. 263 : 151 10; Ferrin and Camerini-Otero (1991) 
Sdence 354 : 1494; Ramdas et al. (1989^ J.Biol Chem. 264 : 11395; Strobel et al, (1991) Sdence 254 : 
1639; Hsieh el al. (1990) oo.ctt.: Rigas et al. (1986) Proc. Natl. Acad. Sd. OJ.S A^ 83: 9591; and 
Camerini-Otero et al. U.S. 7,61 1 ^68, which are incorporated herein by reference). 

Recent advances have resulted in technk^ues allowing enhanced homologous recombination (EHR) 
using recombinases such as recA and RadSI and single-stranded nucleic adds that have sequence 
heterologies. This allows sequence modifications to be specifically targeted to virtually any genomic 
position. See for example, PCT US93/03868 and PCT US98/05223, both of which are expressly 
incorporated herein by reference. 

One area of pressing interest In biology is witiiin the area of "functional genomics", i.e. the correlation of 
genotype and ph notype. Thm requires animal systems, since phenotypic changes must be evaluated 
in vivo. Similarly, and related to this Mea, is the elucklation and charaderization of g ne families, i. . 
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genes or proteins that are structurally related, i.e. they have sequ nc homologies between the 
members f the fomily. Since presumably many, if not most, disease states are caused by multiple 
g ne interactions, the ability ta evaluate interactions among genes, and particularly within or between 
gene fomilies, at th ph notype level, would be extremely valuable. 

The functional genomics tools that allow facste identification and engineering of gene family members in 
animals and cells, however, are not yet available. While the amino acid sequence motifs shared 
between gene femBy memt>ers may be identical, due to degeneracy in the ONA code, the DNA 
sequence identity may be significantly less. Hence, one criterion necessary for genetic modifications of 
gene fomily members ]s development of homologous recombination technologies that can be used to 
clone and modify similar DNA sequences that share little sequence identity. This is particularly 
Important since homologous recombination in cells normally requires significant sequence identity to 
work efficiently. Relaxing the amount of sequence identity needed for homologous recombination 
allows greater flexibility to target related genes for creating transgenic animals and cells containing 
modifications in gene ^mily consensus sequences, and also will allow the rapid cloning, generation of 
gene family specific Kbrarres, and evolution of gene family memk>ers. 

Accordingly, it is an object of the present invention to provide compositions and methods for the 
evaluation and characterization of gene families and the role of individual and sets of genes in disease 
states. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide compositions comprtstng at least one recomt>inase and 
at least two single-stranded targeting polynucleotides which are substantially complementary to each 
other and each having a consensus homology damp for a gene family. 

In an additional aspect, the invention provides compositions comprising at least one recombinase and a 
plurality of pairs of single stranded targeting polynudeoUdes, where the plurality of pairs comprtees a 
set of degenerate probes encoding the consensus sequence. 

In a further aspect, the invention provides kits comprising the compositions of the invention and at least 
one reagent 

In an additional aspect, the invention provkJes methods for targeting a sequence modification in at least 
one member of a consensus family of genes in a cell by homologous recombination. The method 
comprises introdudng mto at least one cell at least one recombinase and at least two single-stranded 
targeting polynudeotides which are substantially complementary to each other and each having a 
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consensus homology clamp forth femily. Th method can additionally compris identifying a target 
cell having a targeted sequ nee modification. 

In a further aspect, the invention provides methods of making a non-human organism with a targeted 
sequence modification in at least one member of a gene femily. The method comprises introducing 
5 into a cell at least one recomt)inase and at least two single-stranded targeting polynucleotides which 

are substantially complementary to each other and each having a consensus homology damp for said 
family. The cell is then subjected to conditions that result in the formation of an animal, and the animal 
has at least one modification in at least one member of a consensus family of genes. 

In an additional aspect, the invention provides methods of isolating a member of a gene family 
10 comprising a protein consensus sequence. The method comprises adding to a complex mixture of 

nucleic acids at least one recomt)inase and at least two single-stranded targeting polynucleotides which 
are substantially complementary to each other and each having a consensus homology damp for said 
family. At least one of the targeting polynudeotides comprises a purification tag. The method is done 
under conditions whereby the targeting polynudeotides form a complex with the memt>er, and the 
15 family memk)er is isolated using said purification tag. The complex nucleic add mixture may be a cDNA 

library, a cell, RNA or a restriction endonudeases genomic digest 

In a further aspect the invention provides non-human organisms containing a sequence modification in 
an endogeneous consensus functional domain of a gene member of a gene family. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 Figures 1 A and 1 B depict a table of protein families and consensus protein motifs. The gene family is a 

family or subfamily with common function or sequence homology used to determine consensus motifs. 
The motif is the amino add cortsensus sequence common to the ^mily members, and amino add 
portion is for the first human example. Parenthetical amino adds refers to all residues found at that 
single position withm the family. Meml3ers refers to the homologous (total and human memk>ers) used 

25 to determine consensus sites. The degeneracy refers to the number and length of different 

oligonudeotides needed in one synthesis to code for all the consensus amino adds used. F^ure 1C 
shows examples of DNA degeneracy. 

Figure 2 depicts a schematic for gene family member isolation and modification. The degenerate 
pFok>e can be made by several different means including those shown. Libraries or linear nudeic adds 
30 can be i^ed for tai^eting. Capture can utilize a biotin moiety as shown or others, described in the text 

and icnown in the art 

Figure 3 depicts gen family member targeting in animals and cells. 
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Figure 4 depicts 14-3-3 protein binding sites in different species and isoforms. 

Rgur 5 depicts 14-3-3 the nucleic add sequences encoding th human binding sit s. 

Figure 6 depicts the protein consensus sequence for the modification of the 14-3-3 binding site. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention is directed to the use of homology motif tags (HMTs) in targeted homologous 
recombination to elucidate disease mechanisms and to identify disease targets contained within gene 
families related by the presence of one or more common domains. That is, there are a large numtier of 
gene families that contain genes related by the presence of similar functional domains, I.e. binding 
domains for substrates or other proteins, enzymatic domains such as kinase or protease domains, 
signaling and regulator domains, receptor binding domains, ATP tN'nding domains, leucine zipper 
domains, zinc finger domains, etc. These functional domains frequently result in primary sequence 
homology; that b, related functional domains have related sequences. Many of these functional 
domains have been studied and so-called 'consensus sequences' identified; that b, an average 
sequence derived from a number of related sequences. Each residue (or set of residues) of the 
consensus sequence is the most frequent at that position in the set under consideration. Consensus 
sequences can be either amino add or nucleic add consensus sequences, with amino add sequences 
being used to generate nudeic acid consensus sequences. 

Interesfingly, whOe a wide variety of gene femilies are known, the majority of drug targets come from 
only four of these gene families. These are the G-protein coupled or seven-transmembrane domain 
receptors, nudear (hormone) receptors, ion channels, esterases. Other important gene families are 
enzymes, including recombtnases. Of the top 100 pharmaceutk:al drugs, 18 bind to severv 
transmembrane receptors, 10 to nuclear receptors and 16 to ion channels. 

By usng HMTs directed to the consensus sequences of gene families for homologous recombination 
and particularly enhanced homologous recombination methods, sequence modifications may be made 
to any numt>er of targeted genes in a related ^mily. 

The present invention can thus be used in a variety of Important ways. Rrst, HMTs can be used in the 
creation of transgenic animal and plant models of disease. Thus, for example, HMTs used in 
homologous recomtunation methods can generate animals that have a wkle variety of mutations in a 
wde variety of related genes, potentially resulting in a wide variety of phenotypes, including phenotypes 
related to disease states. This may also be done on a cellular level, to klentify genes involved in 
cellular phenotypes, i.e. target klentification. Secondly, HMT targeting can be used in cells or aninials 
that are diseased or altered; in essence, HMT targeting can be done to klentify "reversbn" genes. 
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genes that can modulate disease states caused by different genes, either g nes within the same gen 
femity or a completely different g ne family. Thus, for xampleth loss of one type f nzymatic 
activity, resulting in a disease phenotype, may be compensated by alterations in a different but 
homologous enzymatic activity. For xample, the eff cts of the limination of one kinas in a MAP 
5 kinase cascade can t>e overcome by another parallel pathway. 



Accordingly, the present inventk>n provides methods and compositions utilizing homology motif tags 
(HMTs) or consensus sequences. By "homology motif tag" or "protein consensus sequence' herein 
meant an amino ackJ consensus sequence of a gene family. By "consensus nucleic acki sequence* 
herein is meant a nucleic ackl that encodes a consensus protein sequence of a functional domain of a 

10 gene ^mily. In additran, "consensus nucleic acki sequence* can also refer to cts sequences that are 

non-coding but can serve a regulatory or other role. As outlined below, generally a library of consensus 
nucleic ackl sequences are used, that comprises a set of degenerate nucleic ackis encoding the protein 
consensus sequence. A wkie variety of protein consensus sequences for a number of gene families 
are known. A 'gene femity" therefore is a set of genes that encode proteins that contain a functional 

15 domain for which a consensus sequence can t>e kJentified. However, in some instances, a gene femily 

includes non-c(Kling sequences; for example, consensus regulatory regk>ns can be klentified. For 
examjpile, gene family/consensus sequences pairs are known for the G-protein coupled receptor family, 
the AAA-protein family, the bZIP transcription factor family, the mutS family, the recA family, the RadSI 
family, the dmel family, the recF femily, the SH2 domain ^mity. the Bcl-2 family, the single-stranded 

20 binding protein family, the TFIID transcription family, the TGF-beta femity. the TNF femily, the XPA 

family, the XPG ^mtly, actin bindmg proteins, bromodomain GDP exchange Actors, MOM family, 
ser/thr phosphatase family, etc. 



As will be appreciated by those in the art the proteins of the gene families generally do not contain the 
exact consensus sequences; generally cor)ser)sus sequences are artifksal sequences that represent 

25 the best comparison of a variety of sequences. The actual sequence that corresponds to the functional 

sequence YFlthm a particular protein is termed a "consensus functional domain* herein; that is, a 
consensus functional domain is the actual sequence wittitn a protein that corresponds to the consensus 
sequence. A consensus functional domain may also t>e a "predetermined endogenous DMA 
sequence" (also referred to herein as a "predetermined target sequence") that Is a polynucleotkle 

30 sequence contained in a target cell. Such sequences can include, for example, chromosomal 

sequences (e.g., structural genes, regulatory sequences including promoters and enhancers, 
recombinatorial hotspots, repeat sequences, integrated proviral sequences, hairpins, palindromes), 
episomal or extrachromosomal sequences (e.g., replicable plasmkls or viral replication intermediates) 
including chloroplast and mitochondrial DNA sequences. By "predetennined" or "pre-selected* it is 

35 meant that the consensus functional domain target sequence may be selected at the dscretion of the 

practitioner on the basis of known or predkrted sequence infomnation. and is not constrained to specific 
sites recognized by certain stte-spedfic recombinases (e.g., FLP recomtxnase r CRE recombinase). 
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In som embodiments, th predetemriined endogenous DNA target sequence will be other than a 
naturally occurring germiine DNA sequence ( .g., a transgene, parasitic, mycoplasmal r viral 
sequence). 

In a preferred embodiment, the gene family ts the G-protein coupled receptor femily, which has over 
5 900 kientilied members, including several subfamilies. In a preferred embodiment, the G-protein 

coupled receptors are from subfamily 1 and are also called R7G proteins. They are an extensive group 
of receptors which recognize hormones, neurotransmitters, odorants and light and transduce 
extracellular s^nals by Interaction with guanine (G) nudeotide-binding proteins. The structure of all 
these receptors is thought to be virtually identical, and they contain seven hydrophot>ic regions, each of 

1 0 which putatively spans the membrane. The N4erminus is extracellular and is frequently glycosylated, 

and the C4erminus is cytoplasmic and generally phosphorylated. Three extracellular loops attemate 
with three cytoplasmic loops to link the seven transmembrane regions. G-protein coupled receptors 
include, but are not limited to: the class A rhodopsin first subfamily, including amine (acetylcholine 
(muscarinic), adrenoceptors, domamine, histamine, serotonin, octopamine), peptides (angiotensin, 

15 bomt>esln, bradykinin, C5a anaphytatoxin, Fmet-leu-phe, interteukin-8, chemokine, CCK, endotheiin, 

mealnocortin, neuropeptide Y, neurotensin, opk>kl, somatostatin, tachykinin, thrombin, vasopresstn-like, 
galanin, proteinase activated), hormone proteins (fbllide stimulating hormone, lutropir)- 
choriogonadotroptc hormone, thyrotropin), rhodopsin (vertebrate), olfactory (olfectory type 1-11, 
gustatof^, prostanokl (prostaglandin, prostacyclin, thromboxane), nucleotkle (adenosine, 

20 purinoceptors), cannabis, platelet activating factor, gonadotro|:Mn-releastng hormone (gonadotropin 

releasing hormone, thyrotropin-releasing hormone, growth hormone secretagogue), melatonin, viral 
proteins, MHC receptor, Mas proto-oncogene, EBV-lnduced and glucocorticoid induced; the dass B 
secretin second subfanruly, including calcitonin, corticotropin releasing factor, gastric inhitutory peptkJe, 
glucagon, growth hormone releasing hormone, parathyroki hormone, secretin, vasoactive mtestinal 

25 polypeptide, and diuretic hormone; the dass C metabotropic glutamate third subfamily, induding 

metabrotropk: glutamate and extracellular caldum-sensing agents; and the class D pheromone fourth 
sub^mity. 

Because of the large number of family members, these large classes of GPCRs can be further 
subdivkjed into subfamilies. Examples of these subfomilies are induded in Figures 1 A&B where 

30 metabotrofNc is from dass C; calcitonin, glucagon, vasoactive and parathyroid are from dass B; and 

acetylcholine, histamine angiotertsin, a2- and p-adrenergic are from class A. From each subfamily 
small protein consensus sequences can t>e derived from sequence alignments. For example, Rgure 
1 A shows 6 motifs for the metat>otripk: glutamate like GPRCs derived from the indicated numt>er of 
family members. Figure 1C shows certain examples like the first 'EAM (LF) (YFH)* using the single 

35 letter amino add code as is known in the art Using the protdn consensus sequence, degenerate 

nudek: add probes are made to encode the protein consensus sequence, as generally shown in 
Figure 1 , as is well known in the art The protein sequ nee is encoded tiy DNA triplets which are 
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deduced using standard tables. In some cases additional degen racy is used to enable production in 
on ol*^ nucleotide synthesis. In nnany cases motifs wer chosen to minimize degeneracy. The 
examples shown in Figures 1 A-C were designed to facilitate use for amplification of neighboring 
sequences as shown in Rgure 2. This can utilize two motifs as indicated by faithful or error prone 
amplification. Attematively outside sequences can be used as is indicated using vector sequence. In 
addition degenerate oligos can be synthesized and used directly in the procedure without amplification. 

As diagramed in Figure 2. these double stranded (ds) DNA probes are denatured and coated with RecA 
or another recombinase such as Rad51 . This material can be used to bind to and allow capture of 
specific clones from cDNA or genomic libraries. Alternatively this material can be introduced into cells 
producing transgenic cells or animals with alterations in related family members. 

In addition to the first subfamily of G-protein coupled receptors, there is a second subfamily encoding 
receptors that bind peptide hormones that do not show sequence similarity to the first R7G subfamily. 
All the characterized receptors in this subfamily are coupled to G-proteins that activate both adenylyl 
cyclase and the phosphatidylinositol-calcium pathway. However, they are structurally similar; like 
classical R7G proteins they putatively contain seven transmembrane regions, a glycosylated 
extracellular N-terminus and a cytoplasmic C-tenminus. Known receptors in this subfamily are encoded 
on multiple exons, and several of these genes are altematively spliced to yield functionally distinct 
products. The N-terminus contains five conserved cysteine residues putatively important In disulfide 
bonds. Known G-protein coupled receptors in this subfamily are listed above. 

In addition to the first and second subfamilies of G-protein coupled receptors, there is a third subfamily 
encoding receptors that bind glutannate and caldum but do not show sequence similarity to either of the 
other subfamilies. Structurally, this subfamily has signal sequences, very large hydrophobic 
extracellular regions of about 540 to 600 amino acids that contain 17 conserved cysteines (putatively 
involved In disulfides), a region of about 250 residues that appear to contain seven transmembrane 
domains, and a C-tenmtnal cytoplasmic domain of variable length (50 to 350 residues). Known G- 
proteln coupled receptors of this subfamily are listed above. 

In a preferred embodiment, the gene family is the bZlP transcription factor family. This eukaryotic gene 
femily encodes DNA tending transcription factors that contoin a basic region that mediates sequence 
specific DNA binding, and a leucine zipper, required for dimerization. The bZIP family Includes, but is 
not fimiled to, AP-1 , ATF, CREB, CREM. FOS, FRA, GBF, GCN4, HBP, JUN, MET4, OCS1 , OP, TAF1 , 
XBPl.and YBBO. 

In a preferred emb(Kiiment the gene family is involved in DNA mismatch repair, such as mutL, hexB 
and PMS1 . Members of this family include, but are not limited to, MLH1 , PMS1 , PMS2, H xB and MuIL 
The protein consensus sequence is G-F-R-G-E-A-L 
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In a preferred embodiment, the gene family is the mutS family, also invoh^ed in mismatch repair of 
DMA. directed to th correction of mtematched base pairs that hav been missed by th proofreading 
el ment of the DNA polymerase complex. MutS gene ^mily members includ , but are not limited to, 
MSH2, MSH3, MSH6 and Mu^. 

5 In a prefenred embodiment, the gene fomily is the recA family. The bacterial recA is essential for 

homologous recombination and recombtnatorial repair of DNA damage. RecA has many activities, 
including the formation of nucleoprotein filaments, binding to single stranded and double stranded DNA, 
landing and hydrolyzing ATP, recomtxnase activity and interaction with lexA causing lexA activation and 
autocatalytic cleavage. RecA family nr>embers include those from E. coli, drosophila, human, lily, etc. 
1 0 specfficalty including but not limited to, E coii recA, Red , Rec2, Rad51 , RadSI B, Rad51 C, RadSI D, 

RadSIE. XRCC2 and DMC1. 

In a preferred emt>odiment, the gene family is the recF family. The prokaryotic recF protein is a single- 
stranded DNA binding protein which also putativety binds ATP. RecF is invoh^ed In DNA metabolism; it 
is required for recombinatorial DNA repair and for induction of the SOS response. RecF is a protein of 
15 about 350 to 370 amino add r^idues; there is a conserved ATP-binding site motif 'A' In the N4erminal 

section of the protein as well as two other conserved regions, one located in the central section and the 
other in the C-terminal section. 

In a preferred embodiment the gene family is the Bcl-2 family. Programmed cell death (PCD), or 
apoptosis, is induced by events such as growtti factor wittidrawal and toxins. It is generally controlled by 
20 regulators, which have either an inhitxtory effect (ue, anti-apoptotic) or block the protective effect of 

inhifcxtors (pro-apoptotic). Many viruses have fouruJ a way of countering defensive apoptosis kyy 
encoding their own anti-apoptotic genes thereksy preventing their target cells from dying too soon. 

All proteins belonging to the Bcl-2 family contain at lea^ one of a BH1 , BH2, BH3 or BH4 domain. All 
anti-apoptotic proteins contain BH1 and BH2 domains, some of them contain an additional N4ermtnal 

25 BH4 domain (such as Bcl-2, Bcl-x(L), Bd-W, etc.), which is generally not found in pro-apoptotic proteins 

(with the exception of Bcl-x(S). Generally all pro-apoptotic proteins contain a BH3 domain (except for 
Bad), thought to be crucial for the dimertzation of the proteins with other Bcl-2 family members and 
crudal for their killing activity. In addition, some of the pro-apoptotic proteins contain BH1 and BH2 
domains (such as Bax and Bak). The BH3 domain is also present in some anti-apoptosis proteins, such 

30 as Bcl-2 and Bcl-x(L). Known Bcl-2 proteins indude, but are not limited to, Bcl-2, Bcl-x(L), Bcl-W, Bd- 

x(S). Bad. Bax, and Bak. 

In a preferred embodiment, the gene family is the site-spedfic recombinase family. Site-spedfic 
recomkxnation plays an important role in DNA rearrangement in prokaryotic organisms. Two types of 
stte-spedfic recomlxnation are kn wn to ccur a) recomkxnation k)etween inverted repeats resulting in 
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th reversal of a DNA segment; and b) recombination between repeat sequences on two DMA 
molecules resulting In th ir cointegration, or betwe n repeats on on DNA m lecule resulting th 
excision f a DNA fragm nt Site-spedfic recombination is characterized t)y a strand exchang 
mechanism that requires no DNA synthesis or high n rgy cefaclor; the phosphodiester bond energy is 
conserved in a phospho-protein linkage during strand cleavage and re-ligation. 

Two unrelated fanulies of recombinases are currently known. The first, called the 'phage tntegrase" 
family, groups a numt>er of t>acterial, phage and yeast plasmtd enzymes. The second, called the 
"resolvase* femily, groups enzymes which share the folloMng structural characteristics: an N-terminal 
catalytic and dimerization domain that contains a conserved serine reskiue involved in the transient 
covalent attachment to DNA, and a C-terminal heroc^urn-helix DNA-t^inding domain. 

In a preferred embodiment, the gene family is the single-stranded binding protein family. The E coli 
single-stranded binding protein (ssb), also known as the helix-destatMlizing protein, is a protein of 177 
amino adds. It binds tightly as a homotetramer to a single-stranded DNA ss-DNA) and plays an 
important role in DNA replication, recombination and repair. Memt>ers of the ssb family include, t>ut are 
not limited to, £. co// ssb and eukaryotic RPA proteins. 

In a preferred embodiment, the gene femily is the TFIID transcriptbn family. Transcription factor TFilD 
(or TATA-binding protein, TBP), is a general fector that plays a major role in the activation of eukaryotic 
genes transcribed by RNA polymerase II. TFIID binds spedfk;ally to the TATA box promoter element 
whk^h lies dose to the position of transcription initiation. There is a remarkable degree of sequence 
conservatk)n of a C-terminal domain of at)out 180 reskiues in TFIID from various eukaryotic sources. 
This regk)n is necessary and suffident for TATA t>ox binding. The most significant structural feature of 
this domain is the presence of two conserved repeats of a 77 amino-add region. 

In a preferred embodiment, the gene family b the TGF-P family. Transfonning growth foctor-p (TGF-P) 
IS a multifunctk>nal protein that controls proliferation, differentiation and other functions in many cell 
types. TGF-(^-1 is a protein of 1 12 amino add residues derived by proteolytic deavage from the C- 
temninal portion of the precursor protein. Members of the TGF-P femity include, but are not limited to, 
the TGF-1-3 subfamily (induding TGF1 , TGF2, and TGF3); the BMP3 subfamily (BM3B, Bf^PS); the 
BMP5-8 sub^mily (BM8A, BMP5, BI^P6, BMP7, and BMP8); and the BMP 2 & 4 sub^mily (BMP2, 
BMP4. DECA). 

Some protein consensus sequences of the TGF-P family are shown in Figure 1 . 

In a prefenred embodiment, the gene family b the TNF family. A number of cytokines can be grouped 
into a femily on the basis of amino add sequence, as well as structural and functional similarities. 
These include (1) tumor n cross factor (TNF), also known as cachectin or TNF-a, which is a cytokine 
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wrthawkJ variety f fururtions. TT^F-<:t can cause cytolysis of certain tumor cell lines; it s 
the induction of cachexia; it is a potent pyrogen, causing fever by direct action or kjy stimulation of 
interleuldn-1 secretion; and it can stimulate cell profrferation and induce cell differentiation und r certain 
conditions; (2) tymphotoxin-ci (LT-a) and lymphotoxin-P (LT-P), two related cytokines produced kyy 
lymphocytes and which are cytotoxic for a vnde range of tumor celts in vitro and in vivo; (3) T cell 
ant^en gp39 (CD40L), a cytokine that seems to be important in B-cell development and activation; (4) 
CD27L^ a cytokine that plays a role in T-cell activation; it induces the proliferatton of costimulated T cells 
and enhances the generation of cytolytk; T cells; (5) CD30L, a cytokine that induces proliferation of T- 
celte; (6) FASU a cytokine involved In celt death; (8) 4-1 BBL, an indudble T celt surface molecule that 
contributes to T-cell stimulation; (9) OX40L, a cytokine that co-stimulates T cell proliferatk)n and 
cytokine production; and (10), TNF-retated apoptosis inducing Itgand (TRAIL), a cytokine that induces 
apoptosis. 

In a preferred emtxxiiment, the gene family is the XPA ^mily. Xeroderma pigmentosa (XP) is a human 
autosomal recessive disease, characterized tiy a h^h incklence of sunlight-Induced skin cancer. Skin 
cells associated with this condition are hypersensitive to uttaviolet light, due to defects in the tnctsk)n 
step of DNA excision repair. There are a minimum of 7 genetic complementation groups involved in 
this disorder XPA to XPG. XPA is the most common form of the disease and is due to defects in a 30 
kD nuclear protein called XPA or (XPAC). The sequence of XPA is conserved from h^her eukaryotes 
to yeast (gene RAD14). XPA is a hydrophllic protein of 247 to 296 amino add reskiues that has a C4- 
type zinc finger motif in its central section. 

In a preferred embodiment, the gene family Is the XPG family. The defect in XPG can t>e corrected by 
a 133 kD nudear protein called XPG (or XPGC). Members of the XPG family indude, but are not 
limrted to, FEN1 , XPG. RAD2, EX01 . and DIN7. 

Once having identified a gene fomily and a consensus sequence, the compositions of the invention can 
be made. The compositrons of the inventk>n comprise at least one recomlxnase and at least two 
single-stranded targeting polynucleotkies which are sut)stantially complementary to each other and 
each have a corisensus honnology damp for a gene family. 

By "recombinase" herein is meant a protein that, when included with an exogenous ta^eting 
potynudeotide, provide a measurable increase in the recomtxnation frequency and/or localization 
frequency between the targeting polynucleotide and an endogenous predetermined DNA sequence. 
Thus, in a preferred embodiment, increases in recombination frequency from the nonnat range of 10^ 
to10^,to10^to10\ preferably 10^ to 10\ and most preferably 10^^ to 10°, may be achieved. 

In the present invention, recombinase refers to a family fR cA-lik recomt>inati n proteins all having 
essentially all or most of the same functions, particularly: (i) the recombinas protein's ability to 
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property bind to and position targeting polynucleotides on their homologous targ ts and (ti) the at>ility of 
recombinase protein/targeting polynucleotide complexes to effidentty find and bind to complementary 
endogen us sequ nces. Th best characterized recA protein is from E. coli, in addition to the wild-type 
protein a number of mutant recA proteins have been Identified (e.g., recA803; see Madiraju et al., 
5 PNAS USA 85(1 8):6592 (1 988); Madiraju et al. Biochem. 31 :1 0529 (1 992); Lavery et al., J. Biol. Chem. 

26720648 (1992)). Further, many organisms have recA-iike recombinases with strand-transfer 
activities (e.g., Fugisawa et al., (1985) Nucl. Adds Res. 13: 7473; Hsieh et al., (1986) 885; 
Hsieh et aL, (1989) J. Biol. Chem. 264 : 5089; Fishel et al., (1988) Proc. Natl. Acad Sd. aJSA) 85: 
3683; Cassuto et al.; (1 987) MqI, Q^n.Gpn^ t. 2Q8: 10; Ganea et al., (1987) MaLCfillfifiLZ: 3124; 

1 0 Moore et al.. (1 990) JL Biol. Chem. 19:111 08; Keene et al.. (1 984) Nud. Ackte Res. J2: 3057; Kimeic. 

n9841 Cold Soring Harbor Svmp. 48: 675; Kmeic, (1986) SgU 44: 545; Kolodner et al., (1987) Proc. 
Natl. Acad. Sd. USA 84: 5560; Sugino et al., (1985) Proc. Natl. Acad. Sd. USA 85: 3683; Halbrook et 
al.. (1989) J. Biol. Chem. 264 : 21403; Eisen et al.. (1988) Proc. Natl. Acad. Sd. USA 85: 7481; 
McCarthy et al.. (1988) Proc. Natl. Acad. Sd. USA 85: 5854; Lowenhaupt et al., (1989) J. Biol. Chem. 

16 264 : 20568, which are incorporated herein by reference. Examples of such recombinase protons 

indude. for example but not limited to: recA, recA803, uvsX. and other recA mutants and recA-^ike 
recombinases (Roca. A. I. (1 990) Crit Rev. Biochem. Molec. Biol. 25: 41 5). seol (Kolodner et al. 
(1 987) Preg. Nqfl. Aoqcj. Sq. fU.g A-) M:5560; Tishkoff et al. Molec. Cell. Biol. 11:2593). RuvC 
(Dunderdale et al. (1991) Nature 354 : 506). DST2, KEM1. XRN1 (Oykstra et al. (1991) Molec. Cell. 

20 HsL 112583). STPa/DSTI (Clark et al. (1991) Molec. Cell. BioL 11:2576). HPP-1 (Moore et al. (1991) 

Proc. Natl. Acad. Sd. (U.S.A.) §8:9067). other target recombinases (Bishop et al. (1 992) Cell §9: 439; 
Shinohara et al. (1992) Cell 69 : 457); incorporated herein by reference. RecA may be purified from £ 
coft' strains, such as E. co// strains JC12772 and JC15369 (available from A.J. Clark and M. Madiraju. 
University of California-Berkeley, or purchased commercially). These strains contain the recA coding 

25 sequences on a "runaway" replicating plasmkJ vector present at a high copy numt>ers per cell. The 

recA803 protein is a h^h-activity mutant of wild-type recA. The art teaches several examples of 
recomliMnase proteins, for example, from Drosophila, yeast, plant human, and non-human mammalian 
cells, induding proteins with biological properties similar to recA (i.e., recA-like recombinases). such as 
Rad51 . Rad57, dmel from mammals and yeast, and Pk-rec (see RashkJ et al., Nudek: Add Res. 

30 25(4):719 (1997), hereby incorporated by reference). In addition, the recombinase may actually be a 

complex of proteins. I.e. a "recombinosome". In additton. included within the definition of a recomt)inase 
are portions or fragments of recombinases which retain recombinase biological activity, as well as 
valiants or mutants of wild-type recombinases which retain biological activity, such as the E. coli 
recA803 mutant with enhanced recomt)inase activity. 

35 In a preferred embodiment, recA or rad51 is used. For example, recA protein is typically obtained from 

bacterial strains that overproduce the protein: wild4ype E. coli recA protein and mutant recA803 protein 
may be purified from such strains. Alternatively, recA protein can also be purchased from, for xampi . 
Pharmada (Piscataway. NJ) or Boehringer Mannheirn (Indianapolis. Indiana). 
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RecA proteins, and its horn logs, form a nucleoprotein filament when tt coats a single-stranded DNA. 
Inthsnucl oproteinfilam nt, ne monomer of recA protein is bound to at)out 3 nucleotides. This 
property of recA to coat single-stranded DMA b essentially sequence independent, aKh ugh particular 
sequences fevor initial loading of recA onto a polynucleotide (e.g., nudeation sequences). The 
nucleoprotein filament(s) can be formed on essentially any DNA molecule and can be fomied in cells 
(e.g., mammalian cells), forming complexes with both single-stranded and double-stranded DNA, 
although the loading conditions for dsDNA are somewhat different than for ssDMA. 

The recombtnase is combined with targeting polynucleotides as is more fully outiined below. By 
"nucleic add' or "oiigonudeotide' or "polynucleotide* or grammatical equiNfalents herein means at least 
two nudeotides covalently Rnked together. A nudeic add of the present invention will generally contain 
phosphodiester bonds, although in some cases nucleic add analogs are induded that may have 
altemate backt>ones, comprising, for example, phosphoramide (Beaucage et al.. Tetrahedron 
49(1 0):1 925 (1993) and references therein; Letsinger, J. Org. Chem. 35:3800 (1970); Sprinzl et al., 
Eur. J. Blochem. 81:579 (1977); Letsinger et al., Nud. Adds Res. 14:3487 (1986); Sawai et al, Chem. 
Lett 805 (1984), Letsinger et aJ., J. Am. Chem. Soc. 1 10:4470 (1988); and Pauwels et al., Chemica 
Scripta 26:141 91986)), phosphorothioate, phosphorodithioate, O-methylphophoroamidite linkages (see 
Eckstein, Ot^onudeotides and Analogues: A Practical Approach, Oxford University Press), aiul peptide 
nudetc add backbones and linkages (see Egholm, J. Am. Chem. Soc. 114:1895 (1992); Meier et a!., 
Chem. Int Ed. Engl. 31:1008 (1992); Nielsen, Nature, 365:566 (1993); Carlsson et al.. Nature 380207 
(1996), ail of which are incorporated by reference). These modifications of the riix)se-phosphate 
backbone or bases may be done to fadlitate the addition of other moieties such as chemical 
constituents, including 2' O-methyl and 5' modified substituents, as discussed below, or to increase the 
stability and half-life of such molecules in physiological environments. 

The nudek: adds may be single stranded or double strarKled, as specified, or contain portions of tx>th 
dout>le stranded or single stranded sequence. The nucleic add may be DNA, both genomk: and cDNA, 
RNA or a hybnd, where the nudeic add contains any combination of deoxyribo-and ribo-nudeotides, 
and any combination of bases, induding uradi, adenine, thymine, cytosine. guanine, inosine, xathanine 
and hypoxathantne, etc. Thus, for example, chimeric DNA-RNA molecules may be used such as 
described in Cole-Strauss et a!.. Science 273:1386 (1996) and Yoon et al., PNAS USA 93:2071 (1996), 
both of which are hereby incorporated by reference. 

In general, the targeting potynudeotides may comprise any number of structures, as long as the 
changes do not substantially effect the functional ability of the targeting polynudeotide to result in 
homologous recombination. For example, recomtxnase coating of altemate structures should still be 
at)le to occur. 
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By "targeting polynucleotides' herein is meant the polynucleotides used to mak alterations in th 
consensus functional domains of memt)ers of gen fomilies as descrit)ed h r in. Targeting 
polynucleotides are generally ssDNA or dsDNA, most pref rably two complem ntary single-stranded 
DNAs. 

5 Targeting polynucleotides are generally at least at>out 5 to 2000 nucleotides long, preferably about 12 

to 200 nucleotkles long, at least about 200 to 500 nucleotides long, more preferably at le^ at>out 500 
to 2000 nucleotides long, or longer; however, as the length of a targeting polynucleotide increases 
beyond about 20,000 to 50,000 to 400.000 nucleotides, the efficiency or transferring an intact targeting 
polynucleotide into the cell decreases. The length of homology may be selected at the discretion of the 
10 practitioner on the basis of the sequence composition and complexity of the predetermined endogenous 

target DMA sequence(s) and guidance provided in the art, which generally indicates that 1 .3 to 6.8 
kilobase segments of homology are preferred when nornrecombinase mediated methods are utilized 
(Hasty et al. (1 991) Molec. Cell. Biol. 11: 5586; Shulman et al. (1 990) Molec. Cell. Biol. 10: 4466, which 
are incorporated herein by reference). 

1 5 Targeting polynucleotides have at least one sequence that sut>stantially corresponds to, or is 

sut>stantially complementary to, a consensus functional domain, i.e. the predetermined erulogenous 
DMA sequence (i.e., a DNA sequence of a polynucleotide located in a target cell, such as a 
chromosomal, mitochondrial, chloroplast, ^nral, extra chromosomal, or mycoplasmal polynucleotide). 
By "corresponds to' herein is meant that a polynucleotkle sequence is homologous (i.e., may be similar 

20 or kJentical, not stnctly evolutionarily related) to all or a portion of a reference |:K>lynucleotkie sequence, 

or that a polypeptide sequence is kJentical to a reference polypeptkle sequence. In contradistinction, 
the term "complementary to** is used herein to mean that the complementary sequence can hybridize to 
all or a portion of a reference polynucleotide sequence. Thus, one of the complementary single 
stra ruled targeting polynucleotides is complementary to one strand of the endogenous tainget 

25 consensus sequence (\,e. Watson) and corresponds to the other strand of the endogenous target 

consensus sequence (i>e. Crick). Thus, the complementarity between two single-stranded targeting 
poiynudeotkles need not be perfect For illustration, the nucleotide sequence "TATAC corresponds to 
a reference sequence TATAC and is perfectiy complementary to a reference sequence "GTATA". 

The terms "substantially corresponds to" or "sut>stantial kientity" or "homologous" as used herein 
30 denotes a characteristic of a nucleic add sequence, wherein a nucleic add sequence has at least about 

50 percent sequence kientity as compared to a reference sequence, typically at least about 70 percent 
sequence kientity, and preferably at least about 85 percent sequence identity as compared to a 
reference sequence. The percentage of sequence kientity is calculated excluding small deletions or 
additions which total less than 25 percent of the reference sequence. The reference sequence may be 
35 a subset of a larger sequence, such as a portion of a gene or flanking sequence, or a repetitive portion 

of a chromosome. However, the reference sequence is at least 1 8 nudeotides I ng, typically at least 
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about 30 nucleotides I ng, and pr f rably at least about 50 to 100 nuci tides long. "Substantially 
complem ntary" as used herein refers to a sequ nee that is complementary to a sequ nee that 
substantially corresponds to a refer nee sequence. In general, targeting effid ncy iner ases with the 
length of the targeting polynucleotide portion that is substantially complementary to a reference 
sequence present in the target DMA. 

These correspondir)g/complementary sequences are referred to herein as "consensus homology 
clamps', as they serve as templates for homologous pairing with the predetermined endogenous 
sequence(s). Thus, a 'consensus homology clamp' is a portion of the targeting polynucleotide that can 
specifically hybridize to a consensus functional domain within a gene of interest 'Specific hybridization' 
is defined herein as the formation of hyt>rids between a targeting polynucleotide (e.g., a polynucleotide 
of the invention which may include sut>stitutions, deletion, and/or additions as compared to the 
predetermined target nucleic add sequence) and a predetermined target nudeic add, wherein the 
targeting polynudeotide preferentially hyt>ridizes to the predetermined target nucleic add such that for 
example, at least one discrete band can be identified on a Southern blot of nucleic add prepared from 
target cells that contain the target nudeic acid sequence, and/or a targeting polynudeotide In an intact 
nucleus localizes to a discrete chromosomal location characteristic of a unique or repetitive sequence. 
As will be appreciated by those in the art, a target consensus functional domain sequence may be 
present In more than one target polynucleotide spedes (e.g., a particular target sequence may occur in 
multiple members of a gene femily). It is evident that optimal hytutiization conditions wOl vary 
depending upon the sequence composition and length(s) of the targeting polynucleotide(s) and 
target(s), and the experimental method selected by the practitioner. Various guidelines may be used to 
select appropriate hybridization conditions (see. Maniatis et al.. Molecular Cloning: ALatwratory 
Manual (1989), 2nd Ed., Cold Spring Harbor, N.Y. and Berger and Kimmel , Methods in Enzvmoloqy. 
Volume 152. Guide to Molecular ClonirKi Techniques (1987), Academic Press, Inc.. San Diego, CA.), 
which are incorporated herein by reference. Methods for hybridizir>g a targeting polynudeotide to a 
discrete chromosomal location in intad nudel are known in the art see for example WO 93/05177 and 
Kowalczykowski and Zarling (1994) in Gene Targeting, Ed. Manuel Vega. 

In tai^eting polynudeotides, such consensus homology damps are typically located at or near the 5* or 
3' end, preferably consensus homology damps are internal or located at each end of the 
polynucleotide (Berinstein et al. (1992) Molec. Cell. Biol. 12: 360, which is incorporated herein by 
reference). Without wishing to be bound by any particular theory, it is t>elieved that the addition of 
recombinases permits effident gene targeting with targeting polynucleotides having short (i.e., about 10 
to 1000 basepair long) segments of homology, as well as witii targeting polynucleotides having longer 
segments of homology. 

Therefore, it Is preferred that targeting polynucleotides of the invention have consensus homology 
damps that are highly homologous to the predetennined target endogenous consensus functional 



16 



wo 99/37755 



PCT/US98/26498 



domain nudeic acki sequence(s). Typicalty. targeting polynud otidesofth invention have at least ne 
consensus h mology damp that Is at least about 18 to 35 nuci otkies long, and it is pr ferable that 
consensus h mology damps are at least about 20 to 100 nucleotides long, and more preferably at 
least about 100-500 nudeotides long, although the degree of sequence homology between the 
5 consensus homology damp and the targeted sequence and the t>ase composition of the targeted 

sequence will determine the optimal and minimal clamp lengths (e.g., G-C rich sequences are typically 
more thermodynamically stable and will generally require shorter damp length). Therefore, both 
consensus homology damp length and the degree of sequence homology can only be determined with 
reference to a particular predetenmined sequence, but consensus homology clamps generally must be 

10 at least at>out 1 0 nudeotides long and must also substantially correspond or be substantially 

complementary to a predetermined target sequence. Preferat)ly, a homology damp is at least abovX 
10, and preferably at least about 50 nucleotides long and is substantially identical to or complementary 
to a predetermined target sequence. Without wishing to be k>ound by a particular theory, it is believed 
that the addition of recombinases to a targeting polynucleotide enhances the effidency of homologous 

15 reccmkN'nation between homologous, nonisogenic sequences (e.g., between an exon 2 sequence of an 

albumin gene of a Balb/c mouse and a homologous albumin gene exon 2 sequence of a C57/BL6 
mouse), as well as between isogenic sequences. 

The formation of heteroduplex jdnts is not a stringent process; genetic evidence supports the view that 
the classical phenomena of meiotic gene conversion and aberrant mebtic segregation results in part 

20 from the inclusion of mismatched base pairs in heteroduplex joints, and the sufcisequent correction of 

some of these mismatched base pairs before replication. Observations on recA protein have provided 
information on parameters that affed the discrimination of relatedness from perfed or near-perfed 
homology and that affect the indusion of mismatched base pairs in heteroduplex joints. The ability of 
recA protein to drive strand exchange past all single base-pair mismatches and to form extensively 

25 mismatched joints in superhelical DMA reflect its role in recombination and gene conversion. This 

error-prone process may also be related to its role in mutagenesis. RecA-mediated pairing reactions 
involving DMA of 4»X174 and G4, which are about 70 percent homologous, have yielded homologous 
recomtnnants (Cunningham et al. (1981^ Cell 24: 213), although recA preferentially forms homologous 
joints between highly homologous sequences, and is implicated as mediating a homology search 

30 process between an invading DMA strand and a redpient DMA strand, producing relatively stable 

heteroduplexes at regions of high homology. Accordir^ty, it is the fad that recomtnnases can drive the 
homologous recomkMnation reaction l>etween strands which are.s^nificantly, but not perfectty. 
homologous, which allows gene conversion and the modification of target sequences. Thus, targeting 
polynucleotides may l>e used to introduce nucleotide 5Ut>stitutions, insertions and deletions into an 

35 endogenous consensus functional domain nudeic add sequence, and thus the corresponding amino 

add sut>stitutions, insertions and deletions in proteins expressed from the endogenous consensus 
functional domain nucleic add sequence. By "endogenous" in this context herein is meant th naturally 
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occurring sequ nee, i. . sequences or substances rigtnating from within a c 11 or organism. Similarly, 
'exog nous' refers to sequ nces or substances riginating outside the c II r organism. 

In a pref rred embodiment, two substantially complementary targ ting polynuci tides ar used. In one 
embodiment, the targeting polynucleotides form a double stranded hybrid, which may t>e coated with 
recomkxnase, although when the recombinase is recA, the loading conditions may be somewhat 
different from those used for single stranded nudeic adds. 

In a preferred embodiment, two substantially complementary single-stranded targeting polynucleotides 
are used. The two complementary single-strarKled targeting polynudeotides are usually of equal 
length, although this is not required. However, as noted below, the stability of the four strand hybrids of 
the invention is putatively related, in part, to the lack of stgntficant unhybridized single-stranded nudeic 
add, aruJ thus significant unpaired sequences are not preferred. Furthermore, as noted above, the 
complementarity between the two targeting polynucleotides need not be perfect The two 
complementary singl&-stranded faceting polynucleotides are simultaneously or contemporaneously 
introduced into a target cell harboring a predetermined endogenous target sequence, generally wth at 
lease one recombinase protein (e.g., recA). Under most drcumstances, it is prefenred that the targeting 
polynudeotides are incut>ated with recA or other recombinase prior to introduction into a target cell, so 
that the recombinase protein(s) may t>e "loaded" onto the targeting polynudeotide(s), to coat the 
nudeic add, as is described t)elow. lncut>ation conditions for such recombinase loading are described 
Infra, and also in U.S.S.N. 07/755.462, filed 4 September 1 991 ; U.S.S.N. 07/91 0,791 , filed 9 July 1 992; 
and U.S.S.N. 07^20,321, filed 7 May 1990, each of which is incorporated herein by reference. A 
targeting polynudeotide may contain a sequence that enhances the loading process of a recomtxnase, 
for example a recA loading sequence is the recombinogenic nucleation sequence poly[d{/V-C)], arui its 
complement poly(d(G-T)]. The duplex sequence poly[d(A-C)«d(G-T>„, where n is from 5 to 25, is a 
middle repetitive element in target DMA. 

There appears to be a fundamental difference In the stability of RecA-protein-mediated D-loops formed 
t>etween one single-stranded DNA (ssDr4A) probe hybridized to negatively supercoiled DMA targets in 
comparison to relaxed or Gnear duplex DNA targets. Internally located dsDNA target sequences on 
relaxed linear DNA targets hybridized by ssDNA probes produce single D-loops, which are unstable 
after rennoval of RecA protein (Adzuma, Genes Devel. 6:1 679 (1 992); Hsteh et al, PNAS USA 89:6492 
(1 992); Chiu et al.. Biochemistry 32:1 3146 (1 993)). This probe DNA instability of hybrids formed with 
linear duplex DNA targets is most probably due to the incoming ssDNA probe W-C base pairing vwth the 
complementary DNA strand of the duplex target and disrupting the base pairing in the other DNA 
strand. The required high free-energy of maintaining a disrupted DNA strand in an unpaired ssDNA 
conformation in a protein-free single-D-loop apparently can only be compensated for either by the 
stored free energy inherent in negatively supercoiled DNA targets or Isy base pairing initiated at th 
distal nds of the joint DNA molecule, allowing th exchanged strands to freely intertwine. 



18 



wo 99/37755 



PCT/US98/26498 



Howe\fer, the ackiition of a second compi mentary ssDlslA to the three-strand-containing sing)e-D-loop 
stabilizes the d proteinized hybrid joint nnolecules by allowing W-C base pairing f the probe vnth the 
displaced target DNA strand. The addition fas cond RecA-coated complem ntary ssDNA (cssDNA) 
strand to the three-strand containing single D-toop stabilizes deproteinized hybrid joints located away 
from the free ends of the duplex target DNA (Sena & Zarling, Nature Genetics 3:365 (1993); Revet et 
al. J. Mol. Biol. 232:779 (1993); Jayasena and Johnston. J. Mol. Bio. 230:1015 (1993)). The resulting 
four-stranded structure, named a double D-loop by analogy with the three-stranded single D-loop 
hybrid h^ been shown to be stable in the absence of RecA protein. This stability likely occurs because 
the restoration of W-C basepairing in the parental duplex would require disrupdon of two W-C 
basepairs in the double-D-loop (one W-C pair in each heteroduplex D-loop). Since each base-pairing 
in the reverse transition (double-D-loop to duplex) is less favorable by the energy of one W-C basepair, 
the pair of cssDNA probes are thus kinetically trapped in duplex DNA targets in stable hybrid structures. 
The stability of the double-D loop joint molecule within internally located prot>e:target hybrids is an 
intermediate stage pru>r to the progression of the homologous recomt>ination reaction to the strand 
exchange phase. The double D-loop permits isolation of stable multistranded DNA recombination 
intermediates. 

The inventk>n may also be ;»BCticed with indivkiual targeting polynucleotkles which do not comprise 
part of a complementary pair. In each case, a targeting polynucleotide is introduced into a target cell 
stmuKaneousty or contemporaneously ^Mth a recomtMnase protein, typk:ally in the fomn of a 
recomtMnase coated targeting polynucleotkie as outlined herein (i.e., a polynucleotkie pre-lncubated 
with recombinase wherein the recombinase is noncovalently bound to the polynucleotkie; generally 
referred to in the art as a nucleoprotein filament). 

The present inventk>n allows for the introduction of alteratbns in the target nucleic ackl consensus 
functional dom^n of a member of a gene family. That is, the fact that heterologies are tolerated in 
targeting polynudeotkies allows for two things: first, the use of a heterologous consensus homology 
damp that may target consensus functk)nal domains of multiple genes, rather than a single gene, 
resulting in a variety of genotypes and phenotypes, and secondly, the introduction of alterations to the 
target sequence. Thus typically, a targeting polynucleotkie (or complementary polynucleotkie pair) has 
a portk>n or re^n having a sequence that is not present in the preselected erxiogenous targeted 
sequence(s) (i.e., a nonhomologous portion or mismatch) which may t>e as small as a single 
mismatched nucleotide, several mbmatches, or may span up to about several kilobases or more of 
nonhomologoiK sequence. 

Accordingly, In a preferred embodiment, the methods and compositions of the inventton are used for 
inactivation of a gene family gene. That is. exogenous targeting polynudeotkies can be used to 
inactivate, decrease r alter the tnological activity of on or more genes in a cell (or transg nic 
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nonhuman animal or plant). This finds particular use in the generation of animal models of d^ase 
states, or in the elucidation of g ne function and activity, similar to "kn ck ouf experim nts. 
Attemativety, the biological activity of the wild-type gen may be either decreased, or th wild-type 
activity altered to mimic disease states. Thisinclud sg n tic manipulation fn n-codtnggen 
5 sequences that affect the transcription of genes, including, promoters, repressors, enhancers and 

transcriptional activating sequences. 

Thus in a prefenred embodiment, homologous recombination of the targeting polynucleotide and 
endogenous target sequence will result in amino ackJ substitutions, insertions or deletions in the 
endogenous target sequences, potentially bolh within the consensus functional domain region and 

1 0 outskle of it, for example as a result of the incorporation of PCR tags. This will generally result in 

modulated or altered gene function of the endogenous gene, including t>oth a decrease or elimination 
of function as well as an enhancement of function. Nonhomologous portions are used to make 
insertions, deletions, and/or replacements in a predetermined erxiogenous targeted DMA sequence, 
and/cr to make single or multiple nucleotide substitutions in a predetermined endogenous target DMA 

1 5 sequence so that tiie resultant recombined sequence (i.e., a targeted recombinant endogenous 

sequence) incorporates some or all of the sequence information of the nonhomologous portion of the 
targeting poiynucleotide(s). Thus, the nonhomologous regk>ns are used to make variant sequences, 
l.e. targeted sequence modifications. In this way, site directed modifications may t>e done in a variety of 
systems for a variety of purposes. 

20 The erKlogenous target sequence, generally a consensus functional domain, may be disrupted in a 

variety of ways. The term "disrupT as used herein comprises a change in the coding or non-coding 
sequence of an endogenous nudek: add. In one prefenred embodiment, a disrupted gene will no 
longer produce a functional gene product In another preferred embodiment, a disrupted gene 
produces a variant gene product Generally, disruption may occur by either the substitution, insertion, 

25 deletion or frame shifting of nucleotides. 

In one embodiment, amino acki substitutions are made. This can be the result of either the 
incorporation of a non-naturally occurring consensus sequence into a consensus target, or of more 
specific chartges to a particular sequence outskle of the consensus sequence. 

In one embodiment, the endogenous sequence \s disrupted by an insertion sequence. The temi 
30 'insertion sequence" as used herein means one or more nucleotides which are inserted into an 

endogenous gene to disrupt it In general, insertion sequences can t>e as short as 1 nucleotide or as 
kmg as a gene, as outlined herein. For non-gene insertion sequences, the sequences are at least 1 
nucleotide, with from at)OUt 1 to about 50 nucleotides being preferred, and from about 10 to 25 
nucleotides being particularly preferred. An insertion sequence may comprise a polylinker sequence, 
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with from about 1 to akx>ut 50 nucleotides being preferred, and from about 10 to 25 nucleotides being 
particularly preferred. Insertion sequence may be a PGR tag used for identification of the first g n . 

In a preferred embodiment, an insertion sequence comprises a gene which not only disrupts the 
endogenous gene, thus preventing its expression, but also can result in the expression of a new gene 
product Thus, in a preferred emtx>dlment the disruption of an endogenous gene by an insertion 
sequence gene is done in such a manner to allow the transcnption and translation of the insertion gene. 
An insertion sequence that encodes a gene may range from about 50 bp to 5000 bp of cDNA or atx>ut 
5000 bp to 50000 bp of genomic DMA. As will be appreciated by those in the art this can be done In a 
variety of ways. In a prefen-ed embodiment, the insertion gene is targeted to the endogenous gene in 
such a manner as to utilize endogenous regulatory sequences, including promoters, enhancers or a 
regulatory sequence. In an altemate embodiment, the insertion sequence gene includes its own 
regulatory sequences, such as a promoter, enhancer or other regulatory sequence etc. 

Particularly preferred insertion sequence genes include, but are not limited to, genes which encode 
selection or reporter proteins. In addition, the aisertion sequence genes may be modified or variant 
genes. 

The temn "deletion' as used herein comprise removal of a portion of the nucleic add sequence of an 
endogenoi^ gene. Deletions range from about 1 to about 100 nucleotides, with from about 1 to 50 
nucleotides being preferred and from about 1 to at>out 25 nucleotides being particularly preferred, 
although in some cases deletions may be much larger, and may effectively comprise the removal of the 
entire consensus functional domain, the entire endogenous gene and/or its regulatory sequences. 
Deletions may occur in coiT)t>ination with substitutions or modifications to arrive at a final modified 
endogenous gene. 

In a preferred embodiment endogenous genes may be disrupted simultaneously by an insertion and a 
deletion. For example, some or all of an endogenous gene, with or without its regulatory sequences, 
may be removed and replaced with an insertion sequence gene. Thus, for example, all but the 
regulatory sequences of an endogenous gene may be removed, and replaced with an insertion 
sequence gene, which is now under the control of the endogenous gene's regulatory elements. 

The term "regulatory element" is used herein to descrit>e a non-coding sequence which affects the 
transcription or translation of a gene including, l>ut are not limited to, promoter sequences, ribosomal 
binding sites, transcriptional start and stop sequences, translational start and stop sequences, enhancer 
or activator sequences, dimerizing sequences, etc. In a preferred emtxidiment, the regulatory 
sequences include a promoter and transcriptional start and stop sequence. Promoter sequences 
encode either constitutive or inducible promoters. The promoters may be either naturally occurring 
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promoters or hybrid prom ters. Hybrid promot rs, which comlsine lements of more than one 
promoter, are also known in the art, and are useful in the present invention. 

In addition, when the targeting potynuci otides are used to generat insertions or deletions in an 
ertdogenous nucleic ackJ sequence, as is described herein, the use of two complementary single- 
5 stranded targeting polynucleotides allows the use of intemal homology clamps as depicted in the 

figures of PCT US98/05223. The use of intemal homology clamps allows the formation of stable 
deproteinized cssDNArprobe target hybrids with homologous DMA sequences containing either 
relatively small or large insertions and deletions within a homologous DMA target Without being bound 
by theory. It appears that these probertarget hybrids, with heterologous inserts in the cssDNA probe, are 

1 0 stabilized by the re-annealing of cssDNA probes to each other within the double-D-loop hytuid, forming 

a novel DNA structure with an intemal homology clamp. Similarly stable double-D-loop hybrids fomied 
at intemal sites with heterologous inserts in the linear DNA targets (with respect to the cssDNA probe) 
are equally stable. Because cssDNA prot)es are kinetically trapped within the duplex target the 
mutti-stranded DNA intermediates of homologous DNA pairing are statxiized and strand exchange is 

15 facilitated. 

In a preferred embodiment the length of the intemal homology clamp O e. the length of the insertion or 
deletion) is from atx>ut 1 to 50% of the total length of the targeting polynucleotide, vwth from atx>ut 1 to 
about 20% being preferred and from about 1 to about 10% t)eing espedalty preferred, although in 
some cases the length of the deletion or insertion may t>e significantly larger. As for the consensus 
20 homology clamps, ttxe complementarity within the intemal homology clamp need not be perfect 

A targeting polynucleotide used in a method of the invention typically is a single-stranded nucleic add, 
usually a DNA strand, or derived by denaturation of a duplex DNA, which is complementary to one (or 
both) strand(s) of the target duplex nucleic add. Thus, one of the complementary single stranded 
targeting polynudeotktes is complementary to one strand of the endogenous target sequence (i.e. 

25 Watson) and the other complementary single stranded targeting polynucleotide is complementary to 

the other strand of the endogenous target sequence (i-e. Crick). The consensus homology damp 
sequence preferably contains at least 90-95% sequence homology with the target sequence (although 
as outlined atx>ve, less sequence homology can be tolerated), to insure sequence-specific targeting of 
the targeting polynucleotide to the endogenous DNA consensus target Each single-stranded targeting 

30 polynudeotkle is typically about 50-600 bases long, although a shorter or longer polynucleotide may 

also be employed. 

Once the gene family and consensus sequence is selected, the targeting polynucleotkles are made, as 
will be appredated by those in the art For example, for large targeting polynucleotides, plasmkls are 
engineered to contain an appropriately sized gene sequence with a deletion or insertion in the gene of 
35 interest and at least one flanking homology clamp whk;h sut)stantiatty corresponds or is substantially 
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complementary to an ndogenous target DMA sequence. Vect rs containing a targeting polynucl tide 
sequence ar ty|»cally grown in E, coli and then isolated using standard molecular biology methods. 
Attemativ ly, targeting polynucleotides may be prepared in single-stranded form by ligonucleotide 
synthesis m thods, which may first requir ,espedally with larger targeting polynucleotides, formation of 
5 subfragments of the targeting polynucleotide, typically followed by splicing of the subfragments 

together, typtcalty by enzymatic ligation. In general, as wilt be appreciated by those in the art, targeting 
polynucleotides may be produced by chenrucal synthesis of oligonucleotides, nick-4ranslation of a 
double-stranded DNA template, polymerase chain-reaction amplification of a sequence (or ligase chain 
reaction amplification), purification of prokaryotic or target cloning vectors harfc>oring a sequence of 

10 Interest (e.g., a cloned cDNA or genonruc clone, or portion thereoQ such as plasmids, phagemids, 

YACs, cosmids, t>acteriophage DMA, other viral DMA or replication intermediates, or purified restriction 
fragments thereof, as well as other sources of single and dout>l&-stranded polynucleotides having a 
desired nucleotide sequence. VN^en using microinjection procedures it may be preferak>le to use a 
transfection technique with linearized sequences containing only modified target gene sequence and 

1 5 without vector or selectable sequences. The modified gene site is such that a homologous 

recombinant between the exogenous targeting polynucleotide and the endogenous DNA target 
sequence can be identified by using carefully chosen primers and PCR, followed by analysis to detect if 
PGR products specific to the desired targeted event are present (Eriich et al., (1 991) Science 252 : 
1643, which is incorporated herein by reference). Several studies have already used PCR to 

20 successfully identify and then done the desired transfected cell lines (Zimmer and Gruss, (1 989) 

Nature 338 : 150; Mouellic et al., (1990^ Proc. Natt. Acad. Sd. USA 87: 4712; Shesely et al., (1991) 
Proc. Natl. Acad. Sd. USA 88: 4294, which are incorporated herein by reference). This approach is 
very effective when the number of cells receiving exogenous targeting polynudeotide(s) is high (i e., 
with microinjection, or with liposomes) and the treated cell populations are allowed to expand to cell 

25 groups of approximately 1x10^ cells (CapecchI, (1 989) Sdence 244 : 1288). When the target gene is 

not on a sex chromosome, or the cells are derived from a female, both alleles of a gene can be 
targeted by sequential inactivation (Mortensen et al., (1 991 ) Proc. Natl. Acad. Sd. USA fi8: 7036). 
Alternatively, animals heterologous for the target gene can be bred to homologously as is known In the 
art 

30 In addition to consensus homology damps and optional internal homology damps, the targeting 

polynucleotides of the invention may comprise additional components, such as cell-uptake 
components, chemical substituents, purification tags, etc. 

In a preferred embodiment, at least one of the targeting polynucleotides comprises at least one cell- 
uptake component As used herein, the temn "cell-uptake componenf refers to an agent which, when 
35 bound, either directly or indirectly, to a targeting polynucleotide, enhances the intracellular uptake of the 

targeting polynucleotide into at least one cell type (e.g., hepatocytes). A targeting polynud otide of th 
inventi n may optionally be conjugated, typically tiy covalentiy or preferably noncovalent binding, to a 
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cell-uptak compon nt. Various methods have been described in the art for targeting DMA to specific 
c II types. A targ ting polynucleotide of the inv ntion can be conjugated to essentially any of several 
c ll-uptake connpon nts known in the art For targ ting to hepatocytes. a targeting polynucleotide can 
be conjugated to an asialoorosomucokl (ASOR)-poty-L-4ysine conjugate by methods described in the 
art and incorporated herein by reference (Wu GY and Wu CH (1987^ J. Biol. Cham. 262 :4429: Wu GY 
and Wu CH (1 988) Biochemtstrv 2Z:887; Wu G Y and Wu CH (1 988) J. Biol. Chem. 263 : 1 4621 ; Wu GY 
and Wu CH (1992) J. Biol. Chem. 267 : 12436; Wu et al. (1991 U. Biol. Chem. 266 : 14338; and Wilson 
et al. (1 992) J. Biol. Chem. 267 : 963, WO92/061 80; WO92/05250; and W091/1 7761 . which are 
incorporated herein by reference). 

Alternatively, a cell-uptake component may be formed by incubating the targeting polynucleotide with at 
least one lipkl species and at least one protein species to form protein-lipkl-polynucleotide complexes 
consisting essentially of the targeting polynucleotide and the lipid-protein cell-uptake component Lipid 
vesk:les made according to Feigner (W091/17424. incorporated herein by reference) and/or cationic 
lipklization (WO91/16024, incorporated herein by reference) or other fonms for polynucleotide 
administration (EP 465,529, incorporated herein by reference) may also be employed as cell-uptake 
components. Nucleases may also be used. 

In addition to celi-uptake components, targeting components such as nuclear localization signals may 
be used, as is known in the art See for example Kido et at., Exper. Cell Res. 1 98:1 07-1 1 4 (1 992), 
hereby expressly incorporated by reference. 

Typically, a targeting polynucleotide of the invention is coated with at least one recombinase and is 
conjugated to a cell-uptake component, and the resulting cell targeting complex is contacted ^th a 
target cell under uptake conditions (e.g., physiological conditions) so that the targeting polynucleotide 
and the recombinase(s) are intemaPzed in the target cell. A targeting polynucleotide may be contacted 
simultaneously or sequentially with a cell-uptake component and also witti a recombinase; preferably 
the targeting polynucleotide is contacted first with a recombinase, or ^wth a mixture comprising tx>th a 
cell-uptake component and a recomtxnase under conditions whereby, on average, at least about one 
molecule of recomb'nase is noncovalently attached per targeting polynucleotide molecule and at least 
at)out one cell-uptake component also is noncovalently attached. Most preferably, coating of bofh 
recomtxnase and celi-uptake component saturates essentially all of the available bindtrig sites on the 
targeting polynucleotide. A targeting polynucleotide may be preferentially coated with a cell-uptake 
component so ^at the resultant targeting complex comprises, on a molar basis, more cell-uptake 
component than recombina5e(s). Alternatively, a targeting polynucleotide may be preferentially coated 
with recombinase(s) so that the resultant targeting complex comprises, on a molar t>asis, more 
recombinase(s) than cell-uptake component 
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Celkiptake compon nts are included wth recombinase-coated targeting polynucleotides f th 
Invention to enhance the uptake of the recombinase-coated targeting polynucleotide(s) into cells, 
particularly for /n vivo g ne targeting applications, such as g ne therapy to treat genetic diseases, 
including neoplasia, and targeted homologous recombination to treat viral infections wh6rein a viral 
5 sequence (e.g., an integrated hepatitis B virus (HBV) genome or genome fragment) may be targeted by 

homologous sequence targeting and inactivated. Alternatively, a targeting polynucleotide may be 
coated with the cell-uptake component aruJ targeted to cells with a contemporaneous or simultaneous 
admln^lration of a recomtxnase (e.g., liposomes or immunoliposomes containing a recombinase, a 
vira(-based vector encoding and expressing a recombinase). 

10 In addition to recombinase and cellular uptake components, at least one of the targeting 

polynucleotides may include chemical substituents. Exogenous targeting polynucleotides that have 
been modified with appended chemical sut>stituents may be introduced along with recombinase (e.g., 
recA) into a metatx>Ucally active target cell to homologously pair witii a predetermined endogenous 
DNA target sequence in the cell. In a preferred embodiment the exogenous targeting polynucleotides 

1 5 are derivatized, and additional chemical substituents are attached, either during or after polynucleotkie 

synthesis, respectively, and are thus localized to a specific eridogenous target sequence where they 
produce an alteration or chemk:al modification to a local D^4A sequence. Preferred attached chemical 
substituents include, but are not limited to: cross-linking agents (see Podyminogin et al., Biochem. 
34:13098 (1995) and 35:7267 (1996), both of which are hereby incorporated by reference), nucleic ackJ 

20 cleavage agents, metal chelates (e.g., iron/EDTA chelate for iron catalyzed deavage), topoisomerases, 

endonucleases, exonudeases, ligases, phosphodiesterases, photodynamic porphyrins, 
chemotherapeutic drugs (e.g., adriamydn, doxirubk:tn), intercalating agents, labels, base-moctification 
agents, agents which normally bind to nucleic ackis such as labels, etc. (see for example Afonina et al., 
PNAS USA 93:3199 (1996), incor|:A)rated herein by reference) immunoglobulin chains, and 

25 oligonucleotkJes. Iron/EDTA chelates are particularly preferred chemk:al substituents where local 

deavage of a DNA sequence is desired (Hertzt>erg et al. (1 982) J. Am. Chem. Soc. 104 : 31 3; 
Hertzberg and Denrain (1984^ Biochemistry 23: 3934; Taylor et al. (1984) Teti^hedron 40: 457; Dervan, 
PB ( 1986) Science 232 : 464, wtuch are Incorporated herein t>y reference). Further preferred are 
groups that prevent hyt)ridization of the complementary single stranded nucleic adds to each other but 

30 not to unmodified nuciek: adds; see for example Kutryavin et al., Biochem. 35:1 1 1 70 (1 996) and Woo 

et al., Nudeic Add. Res. 24(1 3):2470 (1 996), both of which are incorporated by reference. 2*-0 methyl 
groups are also preferred; see Cole-Strauss et al., Sdence 273:1386 (1996); Yoon et al., PNAS 
93:2071 (1996)). Additional preferred chemical substituents include labeling moieties, induding 
fluorescent labels. Preferred attachment chemisbies include: direct linkage, e.g., via an appended 

35 reactive amino group (Corey and Schultz (1 988) Sdence 238:1401 , which is incorporated herein by 

reference) and other dired linkage chemistries, although streptavidin/k)iotin and 
d)go)dg nin/antidigoxigenin antibody linkag meth ds may also be used. M thods for linking ch mical 
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substituents are provided in U.S. Patents 5,135720, 5,093,245, and 5,055,556, wtiich ar incorporated 
h reinbyref renc . Other linkage chenriistries may be used at the discretion of the practitioner. 

In a preferred mbodiment, at least on of the targeting polynucleotides comprises at I ast one 
purification tag or capture moiety, some of which are discussed above as chemical substituents, for 
example biotin, d^ox^enin, psoralen, etc. Alternatively, the consensus oligonucleotide could be directly 
attached to beads with the targeting reaction perfomned on a solid phase support 

In a preferred emtxKiiment, the targeting polynucleotides are coated with recombinase prior to 
irttroduction to the consensus target The conditions used to coat targeting polynucleotides with 
recombinases such as recA protein and ATPyS have been descritted in commonly assigned U.S.S.N. 
07/910,791, tiled 9 July 1992; U.S.S.N. 07/755,462. filed 4 September 1991; and U.S.S.N. 07^20.321. 
filed 7 May 1990, and PCT US98/05223, each incorporated herein by reference. The procedures 
below are directed to the use of E. coli recA, although as will be appreciated by those in the art other 
recombinases may be used as well. Targeting polynucleotides can be coated using GTPyS, mixes of 
ATPyS with rATP, rGTP and/or dATP, or dATP or rATP alone in the presence of an rATP generating 
system (Boehringer Mannh^). Various mixhjres of GTPyS, ATPyS, ATP, ADP, dATP and/or rATP or 
other nucleosides may be used, particulariy preferred are mixes of ATPyS and ATP or ATPyS and 
ADP. 

RecA protein coating of targeting polynucleotides is ty|w:ally carried out as described in U.S.S J^. 
07/910,791, filed 9 July 1992 and U.S.S.N. 07/755,462, filed 4 September 1991, and PCT US9&/05223, 
which are incorporated herein by reference. Briefly, the targeting polynucleotide, whether 
dout>le-stranded or single-stranded, is denatured by heating in an aqueous solution at 95-1 OO^C for five 
minutes, then placed in an ice bath for 20 seconds to about one minute followed by centrifligation at 
0*^0 for approximately 20 sec, before use. When denatured targeting polynucleotides are not placed in 
a freezer at -20°C they are usually immediately added to standard recA coating reaction buffer 
containing ATPyS, at room temperature, and to this is added the recA protein. Altematively, recA 
protein may be included with the buffer components and ATPyS before the polynucleotides are added. 

RecA coating of targeting polynucleotide(s) is initiated by incubating polynucleotide-recA mixtures at 
37''C for 10-15 min. RecA protein concentration tested during reaction with polynucleotide varies 
depending upon polynucleotide size and the amount of added polynucleotide, and ttie ratio of recA 
motecule:nucleotide preferably ranges t)etween about 3:1 and 1:3. When single-stranded 
polynucleotides are recA coated independently of their homologous polynucleotide strands, the mM 
and //M concentrations of ATPyS and recA, respectively, can be reduced to one-half those used with 
dout)le-5tranded targeting polynucleotides Q.e^ recA and ATPyS concentration ratios are usually kept 
constant at a specific concentration of individual polynucleotid strand, dep nding on wheth r a single- 
or dout>le-stranded polynudeotid sused). 
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RecA protein coating of targeting polynucle tides is normally carried out in a standard IX RecA coating 
reaction buffer. 1 0X RecA reaction buffer (i.e., 1 0x AC buffer) consists of: 1 00 mM Tris acetate (pH 7.5 
at 37''C), 20 mM magnesium acetate, 500 mM sodium acetate, 10 mM DTT. and 50% glycerol). All f 
the targeting polynucleotides, whether double-stranded or single-stranded, typically are denatured 
5 before use by heating to QS-IOO^'C for five minutes, placed on ice for one minute, and subjected to 

centrifugation (10,000 rpm) at 0*^0 for approximately 20 seconds (e.g., in a Tomy centrifuge). 
Denatured targeting polynucleotides usually are added immediately to room temperature RecA coating 
reaction buffer mixed with ATPyS and diluted with double-distilled HjO as necessary. 

A reaction mixture typically contains the following components: (i) 0.2-4.8 mM ATPyS; and Ql) t>etween 
10 1-100 ng/^l of targeting polynucleotide. To ths mixture is added about 1-20 fA of recA protein per 

1 0-1 00 /il of reaction mixture, usually at about 2-10 mg/ml (purchased from Pharmacia or purified), and 
is rapidly added and mixed. The final reaction volume-for RecA coating of targeting polynucleotide is 
usually In the range of ak>out 10-500 RecA coating of targetirig polynucleotide is usually Initiated by 
incut>ating targeting polynucleotide-RecA mixtures at 37^*0 fDr at>out 10-15 min. 

1 5 RecA iMTOtetn concentrations in coating reactions varies depending upon targetirig polynucleotide size 

and the amount of added targeting polynucleotide: recA protein concentrations are typically in the 
range of 5 to 50 ^tM. When single-stranded targeting polynucleotides are coated with recA, 
independently of their complementary strands, the concentrations of ATPyS and recA protein may 
optionally t>e reduced to about one-half of the concentrations used with double-stranded targeting 

20 polynucleotides of the same length: that is, the recA protein and ATPyS concentration ratios are 

generally kept constant for a given concentration of individual polynucleotide strands. 

The coating of targeting polynucleotides witii recA protein can be evaluated in a numt>er of ways. Rrst. 
protein bimfing to DMA can be exannlned using band-shift gel assays (McEntee et al., (1 981) J. Biol. 
Chem. 256 : 8835). Labeled polynucleotides can be coated with recA protein In the presence of ATPyS 

25 and the products of the coating reactions may be separated by agarose gel electrophoresis. Following 

incubation of recA protein with denatured duplex D^4As the recA protein effectively coats 
single-stranded targeting polynucleotides derived from denaturing a duplex DNA. As the ratio of recA 
protein monomers to nucleotides in the targeting polynucleotide increases from 0, 127, 1 2.7 to 3.7:1 
for121-nDerandO. 122, 1:2.2to4.5:1 for 159-mer, targeting polynucleotide's electrophoretic mobility 

30 decreases, i.e., is retarded, due to recA-binding to the targeting polynucleotide. Retardation of the 

coated polynucleotide's motxlity reflects the saturation of targeting polynucleotide with recA protein. An 
excess of recA monomers to DNA nucleotides is required for efficient recA coating of short targeting 
polynucleotides (Leahy et al., ri986) J. Biol. Chem. 261 : 954). 

A second mettiod for evaluating protein binding to DNA is in the use of nitrocellulose fiber binding 
35 assays (Leahy et al., (1 986) J. Biol. Chem. 261:6954; Woodbury, et al., (1 983) Biochemistry 
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S(20):473(M737. The nitrocellul se filter binding method is particularty useful in detennining the 
dissociation-rates for protein:DNA complexes using labeled DNA. In th filter binding assay, 
DNAiprotein complexes are retained n a filter while free DNA passes through the fitt r. This assay 
method is more quantitative for dissociation-rate determinations because the separation of DNA:protein 
5 complexes from free targeting polynucleotide is very rapid. 

Alternatively, recombinase protein(s) (prokaryotic. eukaryota'c or endogeneous to the target celO may be 
exogenously induced or administered to a target cell simultaneously or contemporaneously (i.e., within 
about a few hours) with the targeting polynucleotKle(s). Such admintetration is typically done by micro- 
injection, although electroporation, lipofection, and other transfection methods known in the art may 

10 also be used. Alternatively, recomtxnase-proteins may be produced ia vivo . For example, they may be 

produced from a homologous or heterologous expression cassette in a transfected cell or targeted cell, 
such as a transgenic totipotent cell (e.g. a fertifized zygote) or an embryonal stem cell (e.g., a murine 
ES cell such as AB-1) used to generate a transgenic non-human animal line or a somatic cell or a 
pluripotent hematopoietk: stem cell for reconstftuting all or part of a partkuilar stem cell population (e.g. 

15 hematopotetrc) of an irKlivklual. Conveniently, a heterologous expression cassette includes a 

modulatable promoter, such as an ecdysone-indudble promoter-enhancer combination, an estrogen- 
induct promoter-enhancer combination, a CMV promoter-enhancer, an insulin gene promoter, or 
other cetl4ype spedfk:, developmental stage-specific, hormonennducible drug inducible, such as tetra 
or other modulatable promoter construct so that expression of at least one species of recombinase 

20 protein from the cassette can by modulated for transiently producing recombinase(s) in vivo 

simultaneous or contemporaneous with introduction of a targeting polynudeotkie into the cell. When a 
hormone-lnducfble promoter-enhancer combination is used, the celt must have the required hormone 
receptor present, eittier naturally or as a consequence of expression a co-transfected expression vector 
encoding such receptor. Alternatively, the recomtHnase may be endogeneous and produced in high 

25 levels. In this embodiment, preferably in eukaryotic target cells such as tumor cells, the target cells 

produce an elevated level of recombinase. In other embodiments the level of recomkunase may be 
induced by DNA damaging agents, such as mitomycin C, UV or yHrradiation. Alternatively, 
recombinase levels may be elevated by transfection of a plasmid encoding the recombinase gene into 
the cell. 

30 Once made, the compositions of the invention find use in a number of applications upon administration 

to target cells. In general, the compositions and methods of the invention are useful to identify new 
members of gene families which may be useful in functional genomic studies as well as in the 
identification of new drug targets; both of these may t>e accomplshed through the generation of ''knock 
ouT animal models. In addition, the present invention allows the modification of consensus functional 

35 domain targets, the creation of transgenic plants and animals, tiie cloning of genes containing 

consensus functional domains, etc. 
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In a preferred embodtm ntth present invention finds use in the tolation of new members fg ne 
families. Asisg nerally depicted in Figur 2.th6 useof HMTfilam nts (i.e. consensus homology 
damps preferably containing a purification tag such as biotin, disoxisenin, or one purification method 
such as the use of a recA antibody), allows the identification of new genes within the gene family. Once 
5 Identified, the new genes can be cloned, sequenced and the protein gene products purified. As will be 

appreciated by those In the art the functional importance of the new genes can be assessed in a 
numt>er of ways, including functional studies on the protein level, as well as the generation of "knock 
ouT animal models. By choosing consensus sequences for therapeutically relevant gene families, 
novel targets can t>e kientified that can be used in screening of drug candklates. 

1 0 Thus, in a preferred emtxKfiment, the present invention provides methods for isolating new members of 

gene families comprising introducing targeting polynucleotides comprising consensus homology clamps 
and at least one purification tag, preferably tnotin, to a mix of nucleic add, such as a plasmkl cDNA 
Sbrary or a cell, and then utilizing the purification tag to isolate the gene(s). The exact methods will 
depend on the purification tag; a preferred method utilizes the attachment of the t>inding ligand for the 

15 tag to a bead, whk^h is then used to pull out the sequence. Alternatively anti-recA antibodies could be 

used to capture recA-coated probes. The genes are then cloned, sequenced, and reassemt}led if 
necessary, as is well known in the art 

In an alternate preferred embodiment the present invention finds use in functional genomic studies, by 
provkfing the creation of transgenk: animal models of disease. Thus, for example, HMTs used in 

20 homologous recomkxnation methods can generate animals that have a wkie variety of mutations in a 

wkle variety of related genes, potentially resulting in a wkle variety of phenotypes, including phenotypes 
related to disease states. That is, by targeting a gene family, one, two or multiple genes in the family 
may be altered In any given experiment thus creating a wkle variety of genotypes and phenotypes to 
evaluate. Thus, in a preferred emt)odiment the compositions and methods of the invention are used to 

25 generate poob or libraries of variant nucleic add sequences, wherein the mutations are within the 

consensus functional domain coding region, cellular libraries containing the variant libraries, and 
titiraries of animals containing the variant libraries. 

Furthemnore, HMT targeting can be used in cells or animals that are diseased or altered; in essence, 
HMT targeting can k>e done to ulentify "reversion" genes, genes that can modulate disease states 
30 caused by different genes, either genes within the same gene family or a completely different gene 

^mlly. Thus for example the loss of one type of enzymatic activity, resulting in a disease phenotype, 
may be compensated by alterations in a different but homologous enzymatic activity. 

Accordingly, once the recombinase-targeting polynucleotide compositions are formulated, they are 
introduced or administered into target cells. The administration is typk:ally done as is known for th 
35 administration of nudek; adds into cells, and, as those skilled in the art will appredate, the methods 
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may depend on the ch iceofth target c H. Suitable methods include, but are not limfted to, 
microinjection, lectroporation, Itpofection, etc. By larget celts'* h rein is meant prokary tic or 
eukaryotic cells. Suitable prokaryotic cells include, but ar not limited to, bacteria such as £. cori, 
Badllus species, and the extremophile bacteria such as thermophiles, halophiles, etc. Preferably, the 
5 procaryotic target cells are recombination competent Suitable eukaryotic cells include, but are not 

limited to, fungi such as yeast and filamentous fungi, including species of Aspergillus, Trichoiierma, and 
Neurospora] plant cells including those of com, sorghum, tobacco, canola, soybean, cotton, tomato, 
potato, alfalfe, sunflower, etc.; and animal cells, including fish, reptiles, amphitua, birds and mammals. 
Suitable fish cells include, but are not Kmited to, those from species of salmon, trout, tilapia, tuna, carp, 

10 flounder, halibut, swordfish, cod and zebraf^. Suitable bird cells include, but are not limited to, those 

of chickens, ducks, quail, pheasants, ostrich, and turkeys, artd other jungle foul or game birds. Suitable 
mammalian cells include, but are not limited to, cells from horses, cows, buffalo, deer, sheep, rabbits, 
rodents such as mice, rats, hamsters and guinea pigs, goats, pigs, primates, marine mammals 
including dolphins and whales, as well as cell Bnes, such as human cell lines of any tissue or stem cell 

15 type, and stem cells, including pluripotent and non-pluripotent, and non-human zygotes. Partknilar 

human cells including, tHJt are not limited to, tumor cells of all types (particularly melanoma, myelokl 
leukemia, carcinomas of the lung, breast, ovaries, colon, kklney, prostate, pancreas and testes), 
cardiomyocytes, endothefial cells, epithelial cells, lymphocytes (T-cell and B cell) , mast cells, 
eostnophSs. vascular intimal cells, hepatocytes, leukocytes including mononuclear leukocytes, stem 

20 cells such as haemopoetic, neural, skin, lung, kidney, liver and myocyte stem cells, osteoclasts, 

chondrocytes and other connective tissue cells, kerattnocytes, melanocytes, liver cells, kklney cells, and 
adipocytes. Suitable cells also include known research cells, including, but not limited to, Jurkat T cells, 
mouse La, HT1080, C127, Rat2, CV-1 , NIH3T3 cells, CHO, COS, 293 cells, etc. See the ATCC cell 
tine catalog, hereby expressly incorporated by reference. 

25 In a preferred embodiment, procaryotic cells are used to kJentify, done, or alter gene femity members. 

In this embodiment, a pre-selected target DMA sequence is chosen for alteration. Preferably, the pre- 
selected target DNA sequence is contained within an extrachromosomal sequence. By 
'extrachromosomal sequence" herein is meant a sequence separate from the chromc^mal or 
genomic sequences. Preferred extrachromosomal sequences include plasmids (particularly procaryotic 

30 plasnnkls such as bacterial plasmkis), pi vectors, viral genomes, yeast, t>acterial and mammalian 

artifk:ial chromosomes (YAC, BAG and MAC, respectively), and other autonomously self-replicating 
sequences, although this is not required. As described herein, a recombinase and at least two single 
stranded targeting polynucleotides which are substantially complementary to each other, each of which 
contain a homology damp to the target sequence contained on the extrachromosomal sequence, are 

35 added to the extrachromosomal sequence, preferably in vitro. The two single stranded targeting 

polynucleotides are preferably coated with recombinase. and at least one of the targeting 
polynudeotides contain at least ne nucleotide substitution, insertion or deletion. The targeting 
polynucleotides then bind to the target sequence in th extrachromosomal sequ nee to ffect 
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homologous recombination and form an alt red extrachromosomal sequence which contains the 
substitution, Insertion rdel tion. The altered xtrachromosomal sequence is then introduced into th 
procary tic cell using techniques known in the art Pr ferably, the recombinase is removed prior to 
Introduction Into the target cell, using techniques known in the art For example, the reaction may be 
treated with proteases such as proteinase K, detergents such as SDS, and phenol extraction (Including 
phenol:chloroform:isoamyl alcohol extraction). These methods may also be used for eukaryotic cells. 

Attematively, the pre-selected target DNA sequence is a chromosomal sequence. In this embodiment 
the recombinase with the targeting polynudeotkies are introduced into the target cell, preferably 
eukaryotic target cells. In this embodiment, it may t>e desirable to bind (generally non-covalently) a 
nuclear localization s^nal to the targeting polynudeotkies to fadlitate localization of the complexes in 
the nudeus. See for example Kklo et al., Exper. Cell Res. 198:107-1 14 (1 992), hereby expressly 
Incorporated tiy reference. The targetir^ polynudeotkies and the recomtxnase functk>n to effed 
homologous recomtnnation, resulting in altered chromosomal or genomic sequences. 

In a preferred embodiment eukaryotic cells are used. For making transgenic non-human animals 
(whk:h indude homologousty targeted non-human animals) embryonal stem cells (ES cells), donor 
cells for niidear transfer and fertilized zygotes are preferred. In a preferred embodiment embryonal 
stem cells are used. Murine ES cells, such as AB-1 fine grown on mitoticalty inactive SNL76/7 cell 
feeder layers (McMahon and Bradley, Cell 62: 1073-1085 (1990)) essentially as described (Robertson, 
EJ. (1987) in Teratocardnomas and Embryonic Stem Cells: A Practical Aooroach . EJ. Rot)ertson, ed. 
(oxford: IRL Press), p. 71 -1 1 2; gilstra et al.. Nature 342:435-438 (1 989); and Schwartzberg et al.. 
Sdence 246 :799-803 (1989), each of whrch is incorporated herein by reference) may be used for 
homologous gene targeting. Other suitable ES lines include, but are not limited to, the El 4 line 
(Hooper et al. (1987) Nature 326 : 292-295), the D3 tine (Doetschman et al. (1985) J. Embryol. Exp. 
Morph. 87: 21-45), and the CCE line (Robertson et al. (1986) Nature 323 : 445-448). The success of 
generating a mouse line from ES cells bearing a spedlic targeted mutation depends on the 
pluripotence of the ES cells (i.e., thdr ability, once injected into a host blastocyst, to partidpate in 
emk>ryogenesis and contritMite to the germ cells of the resulting animal). 

The pluripptence of any given ES cell line can vary with time in culture and the care with which it has 
been handled. The only definitive assay for pluripotence is to determine whether the specific population 
of ES cells to be used for targeting can give rise to chimeras capable of germline transmission of the 
ES genome. For this reason, prior to gene targeting, a portion of the parental population of AB-1 cells 
is injected into C57B1/6J blastocysts to ascertain whether the cells are capable of generating chimeric 
mice with extensive ES cell contribution and whether the majority of these chimeras can transmit the ES 
genome to progeny. 
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In a preferred embodiment, non-human zygotes ar used, for example to mak transg nic animals, 
using techniques known in th art (see U.S. Patent No. 4,873,1 91 ; Biinster et al., PNAS 86:7007 
(1989); Susuik: et al., J. Biol. Chem. 4929483 (1995), and Cavard et al., Nucleic Ackls Res. 162099 
(1988), hereby incorporated by reference). Prefenred zygotes include, but are not limited to, animal 
5 zygotes, including fish, avian, reptilian, amphibian and mammalian zygotes. Suitable fish zygotes 

include, but are not limited to, those from species of salmon, trout tuna, carp, flounder, halibut, 
swordfish, cod, tilapta and zebrafish. Suitable bird zygotes include, but are not limited to, those of 
chickens, ducks, quail, pheasant, turkeys, and other jungle fowl and game birds. Suitable mammalian 
zygotes include, but are not limited to, cells from horses, cows, buffalo, deer, sheep, rabl:»ts, rodents 
10 such as mice, rats, hannsters and guinea pigs, goats, p^s, primates, and marine mammals including 

dolphins and whales. See Hogan et al., Manipulating the Mouse Embryo (A laboratory Manual), 2nd 
Ed. Cold Spring Hart>or Press, 1994, incorporated by reference. 

The vectors containing the DNA segments of interest can be transferred into the host cell by welNcnovwi 
methods, depending on the type of cellular host For example, micro-injection is commonly utilized for 

1 5 target cells, although calcium phosphate treatment, electroporation, lipofection. biolistics or viral-t>ased 

transfection also may be used. Other methods used to transfonm mammalian cells include the use of 
Polybrene, protoplast fusion, and others (see, generally. Samt)rook et al. Molecular Cloning: A 
Lat)oratory Manual, 2d ed., 1989, Cold Spring Hart>or Lat>oratory Press, Cold Spring Harbor, N.Y., 
which is incorporated herein by reference). Direct injection of DNA and/or recombinase-coated 

20 targeting polynudeotkJes Mo target cells, such as skeletal or muscle cells also may be used (Wolff et 

al. (1990) Science 247 : 1465, which is incorporated herein t>y reference). 

In a preferred embodiment, the precursor animals or cells already contain a disease allele. As used 
herein, the term "disease allele* refers to an allele of a gene whk:h is capable of producing a 
recognizable disease. A disease allele may be dominant or recessive aruJ may produce disease 

25 directly or when present in comt»natk)n with a specific genetic backgrourKi or pre-existing pathological 

cond^n. A disease allele may be present in the gene pool or may be generated de novo in an 
indivklual by somatic mutation. For example and not limitation, disease alleles include: activated 
oncogenes, a sickle cell anemia allele, a Tay-Sachs allele, a cystic fibrosis allele, a Lesch-Nyhan allele, 
a retinoblastoma-susceptibirrty allele, a Fabry's disease allele, a Huntington's chorea allele, and a 

30 xenoderma pigmentosa allele. As used herein, a disease allele encompasses both alleles associated 

witii human diseases and alleles associated with recognized veterinary diseases. For example, the 
aF508 CFTR allele In a human disease allele which is associated with cystic fibrosis in North 
Americans. 

Once made and administered to target cells, new members of the gene family may be isolated as 
35 outiin d herein. 
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Altemativety, the target c lis may t>e screened to identify a cell that contains the targ ted consensus 
functional donnain sequ nee modification. This will be done in any numk>er of ways, and will depend n 
the target g ne and targeting polynucleotides as will be appreciated by those In th art The screen 
may t>e t>ased n ph notypic, blodiemical, genotypic, or other functional changes, depending on the 
5 target sequence. For example, IgE levels may be evaluated for inflammation or asthma; vascular tone 

or blood pressure can be evaluated for hypertension, behavior screens can be done for neurologic 
effects, lipoprotein profiles can be screened for cardiovascular effects; secreted molecules can t>e 
evaluated for endocrine processes; CBCs can be done for hematology studies, etc. In an additional 
embodiment, as will be appreciated by those in the art, selectable markers or marker sequences may 
10 k>e Included in the targeting polynudeotkles to fediitate later kientificatk>n. 

In a preferred embodiment, kits containing the compositions of the invention are provkJed. The kits 
rndude the compositions, particularly those of libraries or pools of degenerate cssDNA probes, along 
with any number of reagents or buffers, including recombinases, buffers, salts. ATP, etc. 

The broad scope of this invention is best understood with reference to the following examples, which 
15 are not intended to limit the invention in any manner. All references dted herein are expressly 

incorporated by reference. Although the present invention has been described in some detail by way of 
illustration for purposes of darity of understanding, it will t>e apparent that certain changes and 
modiftoatiorts may t>e practiced within the scope of the claims. 

EXAMPLES 

20 Example 1 

Calcitonin Type GPCR subfamily 

A Calcitonin type GPCR subfamily serves as an example. The first consensus motif used is "TWDGW* 
for which degenerate oligonudeotide 'ACNTGGGAYGGMTGG* is synthesized. The second consensus 
motif is "GWGFP* for which antisense degenerate oligomudeotide "NGGRAANCCCCANCC is 

25 synthesized. The degeneracy of these oltgos is 32 and 128 respectively, with each oligo containing a 

Bk>tin moity at the 5' end. cDNA or a cDNA library is used as a template for PCR amplification using 
descrit>ed oligonudeotides as primers. The double stranded-amplified product is thermally denatured, 
cooled and coated with RecA as descrit>ed. A cDNA library is used as sut)strate for targeting. After 
binding the specific target plasmid and washing away nonspedfic sequences the bound material can be 

30 analyzed. Bound plasmkis are transfomied Into E co// cells with colony PCR performed using the 

original oligonucleotides as primers. This particular example should yield a PCR product of about 600 
base pairs depending on ttie femily memt>er isolated. Other screening procedures can also be used 
including but not limited to hybridization to homologous probes, complementation of cells mutant for a 
family member, etc. P sitive colonies (yielding ffid nt and specific amplification) are fiirth r analyzed 

35 by sequence to id ntify family m mt>ers. The DNA sequences can then t>e rev rse transcrit>ed by 
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computer analysis and compared to known protein sequences to detemiine if they repr s nt known or 
nov t family m mbers. 

Example 2 
a2-adrenergic receptors 

Adrenegic receptors play a prominent role in a wkle variety of physk>logkaal processes (Kot»lka, 
chapter 3). Examples of two well-characterized families of adrenergic receptors are the a2-adrenergic 
receptors (a2-ARs) and the ^-adrenergic receptors (P-AR's). a2-AR's play a major role in the 
cardiovascular system and have profound, yet conflicting, effects on blood pressure. If a2-Ars are 
stimulated in the brainstem, blood pressure decreases, whereas if a2-AR's are stimulated in smooth 
muscle, k)lood pressure increases. The three subtypes of A2-AR's known to date, a2a a2b and a2c, 
show 50-60% homology to each other and may each contribute in differing degrees to these effects. 
Our current understanding of the role of each receptor subtype comes from the analysis of animal 
models in which each subtype was systematically knocked out Link et al (1 995, 1 996) show that 
stimulatton of a2b receptors in vascular smooth muscle produced hypertension and counteracted the 
clinically t>enefidal hypotensive effect of stimulating a2a receptors in the central nervous system. Thus, 
knowledge of the specific role of each receptor subtype and its interaction wth other fomily members is 
crucial to understanding the physkilogical significance of each as well as provkling proper therapeutic 
treatments for cfisease states. 

a2-AR$ impact several different physiological systems including the cardiovascular system. There is a 
particular impact on vasoconstiction and vasodilation and the concomitant regulation of t>lood pressure. 
Neural effects include such parameters as sypathetic outflow, sedation/anaesthesia and neurological 
modulation, metabolic effects such as decreased lipolysis, decreased insulin release, and stimulation of 
li^utatary GHRH release. Other miscellaneous effects such as inhibition of gastric motility and/or add 
secretion, platelet aggregation and uterine contractibility. The affected systems allows for easy 
Identification of HMT candidate animals, h^croinjection of the consensus sequence (Fig IB) followed by 
screening the literary of HMT mk;e having moc^cations in extstkig or new a2ARs. The screening is 
done using a variety of existing physiological assays such as blood pressure measurements. Knockout 
animals from known receptor sutytypes, as well as new family memt>ers of specific classes of receptors 
advances the understanding of the biologrcal mechanisms controlled by each. 

Example 3 
B-adrenergic receptors 

At least three c£stinct beta-adrenergk: receptor (P-AR) sut^ypes x^ In mammals, which modulate a 
wide variety of processes including cardiac function, dev lopment and behavior, m tabolismand 
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smooth muscle tone. These subtypes, pi , p2, and P3 adrenergic receptors, share the consensus 
sequence shown in Rg 1 B. Whil the p3 subtype appears to be primarily expressed in adipose tissu 
where it may regulate metabolic processes, the functional contributions spedfic to ither the (31 or pZ 
receptor has proven to be more difficult to assess, as som tissues express both receptor subtypes and 
5 pharmacological agents used to dissect the relative contributions of different receptors are not always 

subtype^pedfic. Again, as with a2ARs, knockout systems have greatly increased our knowledge of 
sutitype specific effects. Knockout animals have not only allowed assignation of function to indivkiual 
subtypes but also serve as a test for functional redundancy between subtypes. Rohrer et al (1 996) 
have shown that the mouse 01 receptor plays a role in development, and regulates the chronotropic 
1 0 and inotropk; responses after administration of agonist As described for a2-ARs, we can use similar 

phontypic screens to isolate, klentify and determine function for members of the P-AR family. 

Example 4 
14-3-3 Proteins 

A fundamental prot>lem in drug discovery for cancer is that model systems are not predictive. Drug 
1 5 candkJates are tested in animals carrying transplanted human tumors (xenografts), but very few drugs 

that show anticancer activity in xenografts have been successful in clink^at trials. Furthermore, cancer is 
a polygenic disease; hence, it is difficult to produce transgenic animal models for cancer with single 
gene modifk:atiorts. 

Most cancers result from defects in DNA repair, cell cycle checkpdnt and regulation or cell apoptosis. 

20 Memt)ers of the 14-3-3 family are involved in many of these pathways. For iristance, 14-3-3 proteins 

are invoh^ in cell cycle control. After DNA damage, 14-3-3 expression is increased by p53, this 
results h the binding of 14-3-3 protein to phosphorytated Cdc25C, whk:h in turn results in the 
dephosphoryiizatbn or Cdc2, which finally causes the cell cyde to stop at G2 stage (Hermeking H., 
Molecular Cell. 1997, vol. 1 , 3-1 1). 14-3^3 protein binds the phosphorytated BAD gene product, an 

25 agonist of apoplosfe (Zha. J, CelL 1996. vol. 87, 619^8; 2ha J., J. Biol. Chem.. 1997, vol. 272, 24101- 

24104). 14-3-3 proteins also regulate Raf, Cbl and other oncogene activities (Geoffrey, J., Clark, J.. 
Biol. Chem. 1997, vol. 272, 20990-20993; Tzivion, G., h4ature. 1998, vol. 394, 88-92). In additk>n, 14-3- 
3 protein expresston is increased m bladder squamous cell cardnomas and lung tumor tissues 
(Ostergaard, M., Cancer Res. . 1 997, vol. 57, 41 1 1-41 17; Nakanishi, K., Hum Antibodies. 1997, vol. 8, 

30 189-94). 

Using 14-3-3 t>inding domains as a consense probe for HMT targeting, several genes in the 14-3-3 
family can be knocked out or modified at the same time to generate cancer models. In the 14-3-3 gene 
family, the binding sites in 14-3-3 proteins are very conserved between spedes and various isoforms. 
This consen/ation is more than 90% at the amin add level, and more than 70% at DNA sequence level 
35 (Rgure 4). Targeting probes des^ned to substitute two basic amino adds (R, K) with ackiic amin 
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acids (E,E) ar shown in Rgure 5. Recombinase proteins formulated with HMT probes allow toleration 
of30%h terologous sequences for h mologous recombination. This probe still has mor than 70% 
homology to 14-3-3 femity proteins, and it can target many 14-3-3 family genes. 

HMT probes from the 14-S-3 gene family are introduced into normal mammalian cells, 14-3-3 targeted 
5 cells are screened by for cell transformation assays. To further validate if particular 14-3-3 targeted 

cells are Important for cancerous phenotypes, targeted cells are transplanted into animals to test for 
tumor formation. The genotype of HMT targeted cells are characterized by PCR and Southem blotting. 

When HMT prot}es from the 14-3-3 gene family are introduced into cells or zygotes used to produce 
transgenic animals, transgenic animal cancer models are screened by their sensitivrty to tumor 
10 generating carcinogenic chemicals. Lung cancer models, transgenic mice are treated vwth urethane. 

In leukemia models, transgenic mice are treated with ynrradiation. For other cancers, y-irradiation or 
other tumor-generating chemicals are also to t>e used. The genotypes of HMT targeted animals are 
characterized by PCR and Southem blotting. 
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CLAIMS 

We daim: 

1 . A composition comprlsino at least one recombinase and at least two single-stranded targeting 
polynucleotides which are substantially complementary to each other and each having a consensus 
homology clamp for a gene family. 

2. A composition according to daim 1 comprising at least one recombinase and a plurality of pairs of 
single stranded targeting polynudeotides which are sut>stantially complementaiy to each other and 
each comprising a consensus homology damp for a gene femlly, said plurality of pairs comprising a set 
of degenerate probes encoding the consensus sequence. 

3. A composition according to claim 1 or 2 wherein said gene family is selected from the group 
consisting of the G-protein coupled receptor family, the AAA-protein family, the bZlP transcription fector 
femily, the mutS family, the recA ^mily, the recF family, the Bd-2 family, the single-stranded t>inding 
protein family; the TFIID transcription family, the TGF-l)eta femily, the TNF family, the XPA family, the 

1 4-3-3 femBy, and the XPG family. 

4. A composition according to daim 1 , 2 or 3 wherein at least one of said polynudeotides further 
comprises an insertion sequence. 

5. A composition according to daim 1 , 2, 3 or 4 wherein at least one of said polynudeotides furtiier 
comprises a purification tag. 

6. A composition according to daim 1 , 2, 3, 4 or 5 wherein said targeting polynucleotides are coated 
with recombinase. 

7. A composition according to daim 1 , 2, 3, 4, 5 or 6 wherein said recombinase is a spedes of 
prokaryotic recombinase. 

8. A composition according to daim 1 , 2, 3, 4, 5 or 6 wherein said recombinase b a spedes of 
eukaryotic recombinase. 

9. A lot comprising the composition of claim 1 , 2, 3, 4, 5, 6, 7 or 8 and at least one reagent 

10. A method for targeting a sequence modification in at least one memt>er of a consensus family of 
genes in a cell by homologous recombinati n, said method comprisirig introdudng into at least one cell 
at I ast one recombinase and at least two single-stranded targeting polynucleotides which are 



37 



wo 99/37755 



PCT/US98/26498 



substantially complem ntaryto achoth r and each having a cons nsus homology damp for said 
family. 

1 1 . A method according to daim 1 0 furth r comprising id ntifying a target cell having a targeted 
sequence modification. 

5 12. A method of making a non-human organism with a targeted sequence modification in at least one 

member of a gene family, said method comprising 

a) introdudng into a cell at least one recombinase arut at least two single^strarKled 
targeting polynucleotides which are substantially complementary to each other and 
each having a consensus homology clamp for said family; and 
10 b) subjecting said cell to conditions that result in the formation of an animal; 

wherein said animal has at least one modification in at least one member of a consensus family of 
genes. 

13. A method according to daim 10, 1 1 or 12 wherein the targeted sequence modification comprises 
the substitution of at least one nudeotide. 

15 14. A method according to daim 10, 11, 12 or 13 wherein the targeted sequence modification 

comprises a plurality of substitutions. 

1 5. A method of isolating a member of a gene family comprising a protein consensus sequence, said 
method compri^ng: 

a) adding to a complex mixture of nudeic adds 
20 O at least one recombinase; and 

h) at least two single-stranded targeting polynucleotides which are substantially 
complementary to each other and each having a consensus homology clamp 
for said family, wherein at least one of said targeting polynucleotides comprises 
a purification tag; 

25 under conditions whereby said targeting polynudeotides form a complex with said 

member; and 

b) isolating said member using said purification tag. 

16. A method according to claim 10, 11, 12, 13, 14 or 15 wherein said targeting polynucleotides are 
coated with said recoml>inase. 

30 17. A method according to daim 10, 11, 12, 13, 14, 15 or 16 wherein the recombinase and the 

targeting polynudeotides are introduced simultaneously. 
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18. A methcxJ according to claim 10, 11, 12, 13, 14, 15, 16 or 17 wherein said cell is a ukaryotic cell. 

19. A nnethod according to claim 10, 11, 12, 13, 14, 15, 16 or 17 wherein said cell is a procaryotic cell. 

20. A method according to claim 10, 11, 12, 13, 14, 15, 16, 17, 18 or 19 wherein said ceil is from an 
organism with a genotyptc disease state. 

5 21 . A method according to claim 1 5 wherein said complex mixture is a cDNA library or a cell. 

22. A non-human organism containing a sequence modification in an endogeneous consensus 
functional domain of a gene memt>er of a gene family. 
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